Integrating Machine Learning with Human Knowledge

Author links open overlay panelChangyu Deng 1, Xunbi Ji 1, Colton Rainey 1, Jianyu Zhang 1, Wei Lu 1 2

1.Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109, USA

2.Department of Materials Science & Engineering, University of Michigan, Ann Arbor, MI 48109, USA

Highlights

  • • Integrating knowledge into machine learning delivers superior performance

  • • Knowledge is categorized and its representations are presented

  • • Various methods to bridge human knowledge and machine learning are shown

  • • Suggestions on approaches and perspectives on future research directions are provided

Summary

Machine learning has been heavily researched and widely used in many disciplines. However, achieving high accuracy requires a large amount of data that is sometimes difficult, expensive, or impractical to obtain. Integrating human knowledge into machine learning can significantly reduce data requirement, increase reliability and robustness of machine learning, and build explainable machine learning systems. This allows leveraging the vast amount of human knowledge and capability of machine learning to achieve functions and performance not available before and will facilitate the interaction between human beings and machine learning systems, making machine learning decisions understandable to humans. This paper gives an overview of the knowledge and its representations that can be integrated into machine learning and the methodology. We cover the fundamentals, current status, and recent progress of the methods, with a focus on popular and new topics. The perspectives on future directions are also discussed.

Graphical Abstract

Subject Areas

Computer Science

Artificial Intelligence

Human-Centered Computing

Introduction

Machine learning has been heavily researched and widely used in many areas from object detection (Zou et al., 2019) and speech recognition (Graves et al., 2013) to protein structure prediction (Senior et al., 2020) and engineering design optimization (Deng et al., 2020; Gao and Lu, 2020; Wu et al., 2018). The success is grounded in its powerful capability to learn from a tremendous amount of data. However, it is still far from achieving intelligence comparable to humans. As of today, there have been few reports on artificial intelligence defeating humans in sensory tasks such as image recognition, object detection, or language translation. Some skills are not acquired by machines at all, such as creativity, imagination, and critical thinking. Even in the area of games where humans may be beaten, machine behaves more like a diligent learner than a smart one, considering the amount of data requirement and energy consumption. What is worse, pure data-driven models can lead to unintended behaviors such as gradient vanishing (Hu et al., 2018; Su, 2018) or classification on the wrong labels with high confidence (Goodfellow et al., 2014). Integrating human knowledge into machine learning can significantly reduce the data required, increase the reliability and robustness of machine learning, and build explainable machine learning systems.

Knowledge in machine learning can be viewed from two perspectives. One is "general knowledge" related to machine learning but independent of the task and data domain. This involves computer science, statistics, neural science, etc., which lays down the foundation of machine learning. An example is the knowledge in neural science that can be translated to improving neural network design. The other is "domain knowledge" which broadly refers to knowledge in any field such as physics, chemistry, engineering, and linguistics with domain-specific applications. Machine learning algorithms can integrate domain knowledge in the form of equations, logic rules, and prior distribution into its process to perform better than purely data-driven machine learning.

General knowledge marks the evolution of machine learning in history. In 1943, the first neuron network mathematical model was built based on the understanding of human brain cells (McCulloch and Pitts, 1943). In 1957, perceptron was invented to mimic the "perceptual processes of a biological brain" (Rosenblatt, 1957). Although it was a machine instead of an algorithm as we use today, the invention set the foundation of deep neuron networks (Fogg, 2017). In 1960, the gradients in control theory were derived to optimize the flight path (Kelley, 1960). This formed the foundation of backpropagation of artificial neural networks. In 1989, Q-learning was developed based on the Markov process to greatly improve the practicality and feasibility of reinforcement learning (Watkins and Dayan, 1992). In 1997, the concept of long short-term memory was applied to a recurrent neural network (RNN) (Hochreiter and Schmidhuber, 1997). The development of these algorithms, together with an increasing amount of available data and computational power, brings the era of artificial intelligence today.

Domain knowledge plays a significant role in enhancing the learning performance. For instance, experts' rating can be used as an input of the data mining tool to reduce the misclassification cost on evaluating lending applications (Sinha and Zhao, 2008). An accurate estimation of the test data distribution by leveraging the domain knowledge can help design better training data sets. Human involvement is essential in several machine learning problems, such as the evaluation of machine-generated videos (Li et al., 2018). Even in areas where machine learning outperforms humans, such as the game of Go (Silver et al., 2017), learning from records of human experience is much faster than self-play at the initial stage.

Knowledge is more or less reflected in all data-based models from data collection to algorithm implementation. Here, we focus on typical areas where human knowledge is integrated to deliver superior performance. In Section Knowledge and Its Representations, we discuss the type of knowledge that has been incorporated in machine learning and its representations. Examples to embed such knowledge will be provided. In Section Methods to Integrate Human Knowledge, we introduce the methodology to incorporate knowledge into machine learning. For a broad readership, we start from the fundamentals and then cover the current status, remarks, and future directions with particular attention to new and popular topics. We do not include the opposite direction, i.e. improving knowledge-based models by data-driven approaches. Different from a review on related topics (Rueden et al., 2019), we highlight the methods to bridge machine learning and human knowledge, rather than focusing on the topic of knowledge itself.

Knowledge and Its Representations

Knowledge is categorized into general knowledge and domain knowledge as we mentioned earlier. General knowledge regarding human brains, learning process, and how it can be incorporated is discussed in Section Human Brain and Learning. Domain knowledge is specifically discovered, possessed and summarized by experts in certain fields. In some subject areas, domain knowledge is abstract or empirical, which makes it challenging to be integrated into a machine learning framework. We discuss some recent progresses on this form of knowledge in Section Qualitative Domain Knowledge. Meanwhile, the knowledge base is becoming more systematic and quantitative in various fields, particularly in science and engineering. We discuss how quantitative domain knowledge can be utilized in Section Quantitative Domain Knowledge.

Human Brain and Learning

Machine learning uses computers to automatically process data to look for patterns. It mimics the learning process of biological intelligence, especially humans. Many breakthroughs in machine learning are inspired by the understanding of learning from fields such as neuroscience, biology, and physiology. In this section, we review some recent works that bring machine learning closer to human learning.

For decades, constructing a machine learning system required careful design of raw data transformation to extract their features for the learning subsystem, often a classifier, to detect or classify patterns in the input. Deep learning (LeCun et al., 2015) relaxes such requirements by stacking multiple layers of artificial neural modules, most of which are subject to learning. Different layers could extract different levels of features automatically during training. Deep learning achieved record-breaking results in many areas. Despite dramatic increase in the size of networks, the architecture of current deep neural networks (DNNs) with 107 learnable weights is still much simpler than the complex brain network with 1011 neurons (Herculano-Houzel, 2009) and 1015 synapses (Drachman, 2005).

Residual neural networks (ResNets) (He et al., 2016), built on convolutional layers with shortcuts between different layers, were proposed for image classification. The input of downstream layers also contains the information from upstream layers far away from them, in addition to the output of adjacent layers. Such skip connections have been observed in brain cells (Thomson, 2010). This technique helps ResNets to win first place on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 classification task (He et al., 2016) and is widely used in other DNNs (Wu et al., 2019).

Dropout, motivated by "theory of the role of sex in evolution", is a simple and powerful way to prevent overfitting of neural networks (Hinton et al., 2012; Srivastava et al., 2014). It is also analogous to the fact that neurons and connections in human brains keep growing and dying (breaking). At the training time, the connections of artificial neurons break randomly, and the remaining ones are scaled accordingly. This process can be regarded as training different models with shared weights. At the test time, the outputs are calculated by a single network without dropout, which can be treated as an average of different models. In recent years, dropout is proposed to be used at the test time to evaluate uncertainty (Gal and Ghahramani, 2016; Gal et al., 2017).

The focus of machine learning nowadays is on software or algorithms, yet some researchers start to redesign the hardware. Current computers, built on the von Neumann architecture, have a power consumption that is several orders of magnitude higher than biological brains. For example, International Business Machines Corporation (IBM)'s Blue Gene/P supercomputer simulated 1% of the human cerebral cortex and could consume up to 2.9 MW of power (Hennecke et al., 2012) while a human brain consumes only around 20 W (Modha, 2017). Neuromorphic systems were proposed and designed to improve energy efficiency. To name a few practical implementations, TrueNorth from IBM (Merolla et al., 2014; Modha, 2017) can solve problems from vision, audition, and multi-sensory fusion; Loihi from Intel (Davies et al., 2018) can solve the least absolute shrinkage and selection operator (LASSO) optimization orders of magnitude superior than the conventional central processing unit (CPU); NeuroGrid from Stanford University (Benjamin et al., 2014) offers affordable biological real-time simulations. In these brain-inspired systems, information is stored and processed at each neuron's synapses, which themselves are responsible for learning (Bartolozzi and Indiveri, 2007). There are various models to mimic neurons by circuits, such as the integrate-and-fire (I&F) model (Bartolozzi and Indiveri, 2007), the Hodgkin-Huxley model (Hodgkin and Huxley, 1952), and the Boolean logic gate design (Deshmukh et al., 2005). The majority of implementations use the I&F model, though their specific circuitry and behaviors vary (Nawrocki et al., 2016). As for materials, most systems utilize the standard silicon fabrication, which allows for integration with relatively mature technology and facilities. Recently, a few neuromorphic components or systems are made of new materials, such as organic or nanoparticle composites (Sun et al., 2018; Tuchman et al., 2020).

Qualitative Domain Knowledge

Along with the neuroscience-inspired fast development of deep learning architecture, knowledge from specific domains enlightens innovative machine learning algorithms. In its primitive format, knowledge is descriptive, intuitive, and often expressed by plain language. Such knowledge is usually based on observation, inference, and induction. Here, we group it as qualitative domain knowledge which requires intensive engagement and interpretation by humans with sophisticated skills. Although there is no universal way to bridge machine learning and qualitative knowledge, qualitative knowledge adds unique insights into the machine learning framework by using customized strategies. This integration offers two primary advantages. For one thing, qualitative knowledge is explained mainly by experts, which means that it highly relies on subjective interpretation. Machine learning consolidates the stringency of expert knowledge so that it can be directly validated by the large amount of raw data. Therefore, the qualitative knowledge can be built on a more statistically rigorous base. For another thing, machine learning models, such as the DNN, are subject to interpretability issues. Using qualitative domain knowledge in machine learning helps dig into the underlying theoretical mechanisms.

Qualitative knowledge can be further divided into three subgroups according to the degree of quantification. Here, we name them as Knowledge in Plain Language , Loosely Formed Knowledge , and Concretely Formatted Knowledge.

Knowledge in Plain Language

Tremendous qualitative knowledge is well established in different disciplines, especially for social science. Sociology has been developed for thousands of years, with modern theory focusing on globalization and micro-marco structures (Ritzer and Stepnisky, 2017). Political scientists proposed institutional theories to profile current governments (Peters, 2019). These theories are usually in the form of plain language. Traditionally, machine learning is far from these domains. However, many empirical theories actually provide good intuition to understand and design better machine learning models. For example, machine learning researchers found that some widely used DNN models have "shape bias" in word recognition (Ritter et al., 2017). This means that the shape of characters is more important than color or texture in visual recognition. At the same time, research in development psychology shows that humans tend to learn new words based on similar shape, rather than color, texture, or size (Nakamura et al., 2012a). This coincidence provides a new theoretical foundation to understand how the DNN identifies objects. Conversely, some unfavorable biases, such as those toward race and gender, should be eliminated (DeBrusk, 2018).

Loosely Formed Knowledge

Qualitative knowledge can be pre-processed in ways that it is expressed in more numerical formats for use in machine learning. One example is that empirical human knowledge in social science can be inserted into machine learning through qualitative coding (Chen et al., 2018). This technique assigns inferential labels to chunks of data, enabling later model development. For example, social scientists in natural language processing (NLP) use their domain knowledge to group and structure the code system in the postprocessing step (Crowston et al., 2012). Moreover, qualitative coding is able to infer relations among latent variables. Human-understandable input and output are crucial for interpretable machine learning. If the input space contains theoretically justified components by humans, the mapping to output has logical meanings, which is much more preferred compared with pure statistical relationship (Liem et al., 2018).

In addition to social science, qualitative knowledge in natural science can be integrated into machine learning as well. For example, physical theories could guide humans to create knowledge-induced features for Higgs boson discovery (Adam-Bourdarios et al., 2015). Another strategy is to transfer language-based qualitative knowledge into numerical information. In computational molecular design (Ikebata et al., 2017), a molecule is encoded into a string including information such as element types, bond types, and branching components. The string is further processed by NLP models. The final strategy is to use experts to guide the learning process to identify a potential search direction. For instance, in cellular image annotation (Kromp et al., 2016), expert knowledge progressively improves the model through an online multi-level learning.

Concretely Formatted Knowledge

Although qualitative knowledge is relatively loose in both social and natural science, there are some formalized ways to represent it. For example, logic rules are often used to show simple relationships, such as implication (AB ), equivalence (AB ), conjunction (AB ), disjunction (AB ), and so on. The simple binary relationship can be extended to include more entities by parenthesis association. It can be defined as a first order logic that each statement can be decomposed into a subject, a predicate, and their relationship. Logic rule--regularized machine learning models attract attention recently. For example, a "but" keyword in a sentence usually indicates that the clause after it dominates the overall sentiment. This first-order logic can be distilled into a neural network (Hu et al., 2016). In a material discovery framework called CombiFD, complex combinatorial prioris expressing whether the data points belong to the same cluster (Must-Link) or not (Cannot-Link) can be used as constraints in model training (Ermon et al., 2014).

Besides logic rules, invariance is another major format of qualitative knowledge which is not subject to change after transformation. An ideal strategy is to incorporate invariance in machine learning models, i.e., to build models invariant to input transformation, yet this is sometimes very difficult. Some details are shown in Section Symmetry of Convolutional Neural Networks. Besides models, one can leverage the invariant properties by preprocessing the input data. One way is to find a feature extraction method whose output is constant when the input is transformed within the symmetry space. For example, the Navier-Stokes equations obey the Galilean invariance. Consequently, seven tensors can be constructed from the velocity gradient components to form an invariant basis (Ling et al., 2016). The other way is to augment the input data based on invariance and feed the data to models (see Section Data Augmentation).

Quantitative Domain Knowledge

In scientific domains, a large amount of knowledge has been mathematically defined and expressed, which facilitates a quantitative analysis using machine learning. In the following sections, three groups of quantitative knowledge, in terms of their representation formats, are discussed: equation-based, probability-based, and graph-based knowledge.

Equation-Based Knowledge

Equality and inequality relationships can be established by algebraic and differential equations and inequations, respectively. They are the predominant knowledge format in physics, mathematics, etc. At the same time, there is increasing amount of equation-based knowledge in chemistry, biology, engineering, and other experiment-driven disciplines. A great benefit of equations in aforementioned areas is that most variables have physical meanings understandable by humans. Even better, many of them can be measured or validated experimentally. The insertion and refinement of expert knowledge in terms of equations can be static or dynamic. Static equations often express a belief or truth that is not subject to change so that they do not capture the change of circumstance. Dynamically evolving equations, such as those used in the control area, are being used to express continuously updating processes. Equations in different categories play diverse roles in the machine learning pipeline, so equation-based knowledge can be further divided into subgroups according to their complexity.

The simplest format is a ground-truth equation, expressing consensus such as Mass=Density×Volume. Since it cannot be violated, this type of equation is usually treated as constraints to regularize the training process in the format of Loss(x) = original_Loss(x) +∑iλihi(x), where hi(x) is the equation-enforced regularization term and λi is the weight. For example, the object trajectory under gravity can be predicted by a convolutional neural network (CNN) without any labeled data, by just using a kinetic equation as the regularization term (Stewart and Ermon, 2017). The kinetic equation is easily expressed as a quadratic univariate equation of time or h(t)=at2+v0t+h0. In another study, a robotic agent is designed to reach an unknown target (Ramamurthy et al., 2019). The solid body property enforces a linear relationship of segments, which serves as a regularizer in the policy architectures. The confidence of ground truth influences the degree of regularization through a soft or hard hyperparameter. For a complicated task, an expert must choose the confidence level properly.

At the second level, the equation has concrete format and constant coefficients, with single or multiple unknown variables and their derivatives. The relationship among those variables is deterministic. This means that the coefficients of these equations are state independent. Particularly, ordinary differential equation (ODE) and partial differential equation (PDE) belong to this category, which are being researched extensively within machine learning. They have the generalized form of f(x1,...,xn,∂u∂x1,...) = 0. Although only few differential equations have explicit solutions, as long as their formats can be determined by domain knowledge, machine learning can numerically solve them. This strategy inspires data-driven PDE/ODE solvers (Samaniego et al., 2020). Prior knowledge, such as periodicity, monotonicity, or smoothness of underlying physical process, is reflected in the kernel function and its hyperparameters.

In its most complicated format, equations may not fully generalize domain knowledge when the system has characteristics of high uncertainty, continuous updating, or ambiguity. The coefficients are state dependent or unknown functionals. These issues can be partially addressed by building a hybrid architecture of machine learning and PDE, in which machine learning helps predict or resolve such unknowns. For example, PDE control can be formulated as a reinforcement learning problem (Farahmand et al., 2017). The most extreme condition is that the form of coefficient/equation is unknown, but it can still be learned by machine learning purely from harnessing the experimental and simulation data. For example, governing equations expressed by parametric linear operators, such as heat conduction, can be discovered by machine learning (Raissi et al., 2017). Maximum likelihood estimation (MLE) with Gaussian process priors is used to infer the unknown coefficients from the noisy data. In another example, researchers propose to estimate nonlinear parameters in PDEs using quantum-behaved particle swarm optimization with Gaussian mutation (Tian et al., 2015). Inverse modeling can also be used to reconstruct functional terms (Parish and Duraisamy, 2016).

Probability-Based Knowledge

Knowledge in the form of probabilistic relations is another major type of quantitative knowledge. A powerful tool used in machine learning is Bayes' theorem which regulates the conditional dependence. We have prior knowledge of the relations of variables and their distributions. Given data, we could adjust some probabilities to fit observations.

Some machine learning algorithms have the intrinsic structure to capture probabilistic relations. For example, parameters of the conditional distribution in Bayesian network can be learned from data or directly from encoded knowledge (Flores et al., 2011; Masegosa and Moral, 2013). Domain knowledge can also be directly used to determine probabilities. For instance, gene relations help build optimal Bayesian classification by mapping into a set of constraints (Boluki et al., 2017). For instance, if gene g2 and g3 regulate g1 with X 1 = 1 when X 2 = 1 and X 3 = 0 according to the domain knowledge, the constraint can be enforced by P(X1=1|X2=1,X3=0)=1.

Graph-Based Knowledge

In both natural science and social science fields, a lot of knowledge has the "subject verb object" structure. A knowledge graph consists of entities and links among them. It can be expressed as a set of triples. The degree of their correlation can be numerically expressed and graphically visualized. Knowledge graphs are initially built by expert judgment from data. With growing size of data available, machine learning algorithms play an important role in constructing large knowledge graphs.

Google knowledge graph is an example that we access daily. It covers 570 million entities and 18 billion facts initially and keeps growing to have over 500 billion facts on ∼5 billion entities (Paulheim, 2017). For example, when you search basketball in Google, highly related NBA teams and stars will appear on the right. Another famous general knowledge-formed knowledge graph is ConceptNet, which connects words and phrases of natural language with labeled edges. It can be combined with word embeddings with a better understanding of the underlying meanings (Speer et al., 2016). Specific knowledge graphs are popular in different domains. In NLP, WordNet is a semantic network to link synonyms for highlighting their meanings rather than spelling. It can enhance the performance of search applications (Fellbaum, 2012). Medical and biologic fields have, for instance, MeSH (Lowe and Barnett, 1994). It is hierarchically organized vocabulary used for indexing, cataloging, and searching of biomedical and health-related information, upon which some machine learning models are built (Abdelaziz et al., 2017; Gan et al., 2019).

Knowledge graphs and machine learning mutually benefit each other. A graph may be incomplete when there are missing entities, relations, or facts and thus needs machine learning approaches to supplement the information. For example, knowledge graph can be trained together with tasks of recommendation to connect items (Cao et al., 2019). Human-computer interaction and knowledge discovery approach (Holzinger et al., 2013) can be used to identify, extract, and formalize useful data patterns from raw medical data. Meanwhile, knowledge graph helps human to understand the related fields to promote machine learning. For instance, a better understanding of biology and neuroscience leads to advanced machine learning algorithms and neuromorphic systems (Section Human Brain and Learning). Besides, machine learning models, such as neural networks, can be built upon knowledge graph, as is further discussed in Section Design of Neuron Connections.

Methods to Integrate Human Knowledge

A complete process to design and implement a machine learning model consists of multiple steps. The first step is to formulate the appropriate tasks based on the goal. One needs to determine what the machine should learn, i.e., the inputs and outputs. After simplifying the problem by some assumptions, a model is built with unknown parameters to explain or correlate the inputs and outputs. Then, the model is initialized and trained by the collected data. After this, the machine learning model is ready for inference and evaluation. In practice, the process is not necessarily in such a chronological order but usually follows an iterative process. Some algorithms incorporate humans in the loop for feedback and evaluation. Human knowledge can be incorporated almost anywhere in this process. In the following sections, we review the methods to integrate human knowledge into machine learning. We organize them based on the sub-domains of machine learning field and group them according to their major contribution to the steps aforementioned. We should note that (1) there are numerous approaches, thus we focus on popular or emerging methods that could work efficiently across disciplines; (2) the methods are interwoven, namely, they may be used in several sub-domains and contribute to multiple steps; thus, we consider their main categories and detail them only in one place.

Task Formulation

Machine learning models, such as neural networks and support vector machines, take an array of numbers in the form of vectors or matrices as inputs and make predictions. For a given goal, there remains flexibility for humans to formulate the task, i.e., to determine the inputs and outputs of the machine learning models. Humans could combine similar tasks based on their background and shared information (Section Multitask Learning). Domain knowledge is necessary to understand and make use of the similarity of tasks. Also, we need to carefully decide the inputs of machine learning models (Section Features) to best represent the essence of the tasks. We can leverage expert knowledge in the domain or some statistical tools when engineering these features.

Multitask Learning

Humans do not learn individual tasks in a linear sequence, but they learn several tasks simultaneously. This efficient behavior is replicated in machine learning with multitask learning (MTL). MTL shares knowledge between tasks so they are all learned simultaneously with higher overall performance (Ruder, 2017a). By learning the tasks simultaneously, MTL helps to determine which features are significant and which are just noise in each task (Ruder, 2017a). Human knowledge is used in MTL to determine if a group of tasks would benefit from being learned together. For example, autonomous vehicles use object recognition to drive safely and arrive at the intended destination. They must recognize pedestrians, signs, road lines, other vehicles, etc. Machine learning could be trained to recognize each object individually with supervised learning, but human knowledge tells us that the objects share an environment. The additional context increases accuracy as MTL finds a solution that fits all tasks. MTL has also been used extensively for facial landmark detection (Ehrlich et al., 2016; Ranjan et al., 2017; Trottier et al., 2017; Zhang et al., 2014) and has even contributed to medical research through drug behavior prediction (Lee and Kim, 2019; Mayr et al., 2016; Yuan et al., 2016) and through drug discovery (Ramsundar et al., 2015). Object recognition is a common use of MTL due to its proven benefits when used alongside CNNs (Girshick, 2015; Li et al., 2016).

Currently, the two main methods of task sharing are hard and soft parameter sharing as shown in Figure 1. Hard parameter sharing is where some hidden layers are shared while the output layers remain task specific. In soft parameter sharing, each task has its own model and parameters, but the parameters are encouraged to be similar through the L2 norm regularization. Currently, hard parameter sharing is more common due to being more effective in reducing overfitting. Alternative methods to hard and soft parameter sharing have been proposed, such as deep relationship networks (Long et al., 2017), cross-stitch networks (Misra et al., 2016), and sluice networks (Ruder et al., 2019). Deep relationship networks use matrices to connect the task-specific layers so they also increase the performance alongside the shared layers. Cross-stitch networks attempt to find the optimal combination of shared and specific layers. Sluice networks combine several techniques to learn which layers should be shared. These methods aim to find a general approach that works broadly so that it can be easily used for all MTL problems. However, so far, the optimal task sharing method is different for each application. This means human knowledge on the application subject and on the various task sharing methods is a necessity to find the best method for the application. It has been found that MTL is unlikely to improve performance unless the tasks and the weighting strategies are carefully chosen (Gong et al., 2019). Continued research is needed for optimal strategies to choose and balance tasks.

  1. Download: Download high-res image (140KB)
  2. Download: Download full-size image

Figure 1. Illustration of Hard and Soft Parameter Sharing

(A) hard parameter sharing.

(B) soft parameter sharing.

Redrawn from (Ruder, 2017a).

Another important area in MTL is to gain benefits even in the case where only one task is important. Research shows that MTL can still be used in this situation by finding an appropriate auxiliary task to support the main task (Ruder, 2017a). Similar to finding the best task sharing method, significant human knowledge is needed to find an effective auxiliary task. Another approach is to use an adversarial auxiliary task which achieves the opposite purpose of the main task. By maximizing the loss function of the adversarial task, information can be gained for the main task. There are several other types of auxiliary tasks, but sometimes an auxiliary task is not even needed. In recent developments, MTL principles are utilized even in single task settings. Pseudo-task augmentation is where a single task is being solved; however, multiple decoders are used to solve the task in different ways (Meyerson and Miikkulainen, 2018). Each solving method is treated as a different task and implemented into MTL. This allows a task to be solved optimally as each method of solving the task learns from the other methods.

Features

"Features" in machine learning refer to variables that represent the property or characteristic of the observations. They could be statistics (e.g., mean, deviation), semantic attributes (e.g., color, shape), transformation of data (e.g, power, logarithm), or just part of raw data. "Feature engineering" is the process to determine and obtain input features to optimize machine learning performance. In this section, we will discuss the approaches in this process.

Feature engineering, especially feature creation, heavily relies on human knowledge and experience in the areas. For instance, in a credit card fraud detection task, there are many items associated with each transaction, such as transaction amount, merchant ID, and card type. A simple model treats each transaction independently and classifies the transactions by eliminating unimportant items (Brause et al., 1999). Later, people realize that costumer spending behaviors matter as well. Then, their indicators, such as transactions during the last give number of hours and countries, are aggregated (Whitrow et al., 2009). This methodology is further amended by capturing transaction time and its periodic property (Bahnsen et al., 2016). We could see from this example that it is an iterative process that interplays with feature engineering and model evaluation (Brownlee, 2014). Although the guidelines vary with specific areas, a rule of thumb is to find the best representation of the sample data to learn a solution.

In addition to domain knowledge, there are some statistical metrics in feature selection widely used in different areas. There are mainly two types of ideas (Ghojogh et al., 2019). One is called filter methods, which rank the features by their relevance (correlation of features with targets) or redundancy (whether the features share redundant information) and remove some by setting a threshold. This type includes linear correlation coefficient, mutual information, consistency-based filter (Dash and Liu, 2003), and many others. The other type is called wrapper methods. These methods train and test the model during searching. They look for the subset of features that correspond to the optimal model performance. Since the number of combinations grow exponentially with the number of features, it is a non-deterministic polynomial-time (NP) hard problem to find the optimal subset. One searching strategy, sequential selection methods, is to access the models sequentially (Aha and Bankert, 1996). The other strategy is to implement metaheuristic algorithms such as the binary dragonfly algorithm (Mafarja et al., 2017), genetic algorithm (Frohlich et al., 2003), and binary bat algorithm (Nakamura et al., 2012b). Wrapper methods can be applied to simple models instead of the original ones to reduce computation. For instance, Boruta (Kursa and Rudnicki, 2010) uses a wrapper method on random forests, which in essence can be regarded as a filter method. In addition to these two types, there exist other techniques such as embedded methods, clustering techniques, and semi-supervised learning (Chandrashekar and Sahin, 2014).

"Feature learning", also called "representation learning", is a set of techniques that allow machine to discover or extract features automatically. Since the raw data may contain redundant information, we normally want to extract features with lower dimension, i.e., to find a mapping Rd→Rp where p≤d and usually p≪d. Thus, these methods are sometimes referred to as dimension reduction. Traditional ways are statistical methods to extract features. Unsupervised (unlabeled data) methods include principal component analysis (Wold et al., 1987), maximum variance unfolding (Weinberger and Saul, 2006), Laplacian eigenmap (Belkin and Niyogi, 2003; Chen et al., 2010), and t-distributed stochastic neighbor embedding (t-SNE) (Maaten and Hinton, 2008). Supervised (labeled data) methods include Fisher linear discriminant analysis (Fisher, 1936), its variant Kernel Fisher linear discriminant analysis (Mika et al., 1999), partial least squares regression (Dhanjal et al., 2008), and many approaches on supervised principal component analysis (Bair and Tibshirani, 2004; Barshan et al., 2011; Daniušis et al., 2016). Recent works are mostly based on neural networks, such as autoencoders (Baldi, 2012), CNNs, deep Boltzman machines (Salakhutdinov and Hinton, 2009). Even though these methods extract features automatically, prior knowledge can still be incorporated. For instance, to capture the features of videos with slow moving objects, we can represent the objects by a group of numbers (such as their positions in space and the pose parameters) rather than by a single scalar, and these groups tend to move together (Bengio et al., 2013; Hinton et al., 2011). Or we can deal with the moving parts and the static background separately (Li et al., 2018). Another example, which applies the principal of Multitask Learning, is to train an image encoder and text encoder simultaneously and correlate their extracted features (Reed et al., 2016). Other strategies to integrate knowledge include data manipulation and neural network design, as will be discussed in the following sections.

Model Assumptions

Machine learning models are built upon assumptions or hypotheses. Since "hypothesis" in machine learning field commonly refers to model candidates, we use the word "assumption" to denote explicit or implicit choices, presumed patterns, and other specifications on which models are based for simplification or reification. The assumptions could be probabilistic or deterministic. We first introduce probabilistic assumptions in Sections Preliminaries of Probabilistic Machine Learning, Variable Relation and Distribution, and then briefly discuss Deterministic Assumptions.

Preliminaries of Probabilistic Machine Learning

Mathematically, what are we looking for when we train a machine learning model? From the perspective of probability, two explanations have been proposed as shown in Equations (1) and (2) (Murphy, 2012). One is called MLE (maximum likelihood estimation) which means that the model maximizes the likelihood of the training data set,(Equation 1)θMLE=argmaxθpD|θwhere D is the training data set and p(D|θ) is the probability of data provided by the machine learning model whose parameter is θ. The machine learning model estimates the probability of observation, while the training process tries to find the parameter which best accords with the observation. The specific form of D depends on models. For example, in supervised learning, we can rewrite the form as p(Y|X,θ), i.e., the probability of label Y given input X and model parameters θ, and we regard the label with the highest probability as the model's prediction.

The other strategy, maximum a posteriori (MAP), means to maximize the posterior probability of the model parameter,(Equation 2)θMAP=argmaxθpθ|D=argmaxθpD|θpθpD=argmaxθpD|θpθ

MAP takes into account p(θ), the prior distribution of θ. From the equations above, we can observe that MLE is a special case of MAP where p(θ) is uniform.

Variable Relation

A variable could be an instance (data point), a feature, or a state of the system. An assumption on variable relation used in almost all models is the independence of data instances; therefore, the probability of data sets is equal to the product of instance probabilities. For instance, in unsupervised learning we have p(X|θ)=∏ip(x(i)|θ).

When the variables are related, we could simplify the likelihood function by assuming partial independence. For instance, in image generation models, pixels are correlated to each other; thus, an image x is associated with all pixels xi (i=1,2,...). Pixel RNN (Oord et al., 2016a) assumes that pixel xi is only correlated to its previous pixels and independent of those afterward:(Equation 3)p(x)=p(x1,x2,...,xN)≈∏i=1Np(xi|x1,x2,...,xi−1)

Under this assumption, pixels are generated sequentially by an RNN. As a simpler implementation, we can pay more attention to the local information and calculate a pixel value by the previous ones close to it (Oord et al., 2016b).

In the stochastic process of time series, Equation (3) is commonly used, and xi denotes the variable value at time i. In this context, the approximation is more of knowledge than an assumption since it is hard to imagine that the future would impact the past. It can be simplified further by assuming that future probabilities are determined by their most recent values p(xi|x1,x2,...,xi−1)≈p(xi|xi−1) and then the variables xi, called states , form a Markov chain. A more flexible model, hidden Markov model (HMM), treats the states as hidden variables and the observations have a conditional probability distribution given the states. HMM is widely used in stock market, data analysis, speech recognition, and so forth (Mor et al., 2020). Such Markov property is also presumed in reinforcement learning (Section Reinforcement Learning), where the next state is determined by the current state and action.

In addition to sequential dependence, the relation between variables can be formulated as a directed acyclic graph or a Bayesian network in which nodes represent the variables and edges represent the probabilistic dependencies. Learning the optional structure of a Bayesian network is NP-hard problem, which means that it requires a huge amount of computation and may not converge to global optima. Prior knowledge of the variables can be incorporated by their dependencies, such as the existence or absence of an edge and even the probability distribution (Su et al., 2014; Xu et al., 2015). The results of learned networks get improved while computational cost is reduced.

Distribution

The distribution of variables is unknown and has to be assumed or approximated by sampling. A very popular and fundamental distribution is the Gaussian distribution, a.k.a. normal distribution. Its popularity is not groundless; instead, it is based on the fact that the mean of a large number of independent random variables, regardless of their own distributions, tends toward Gaussian distribution (central limit theorem). The real-world data are composed of many underlying factors, so the aggregate variables tend to have a Gaussian distribution in nature. Some models are named after it, such as Gaussian process (Ebden, 2015) and Gaussian mixture model (Shental et al., 2004). Independent component analysis (ICA) decomposes the observed data into several underlying components and sources (Hyvärinen, 2013). ICA assumes that the original components are non-Gaussian. In linear regression models, the least square method can be derived from Equation (1) while assuming the output Y has a Gaussian distribution with a mean of θTX, namely p(D|θ)=p(Y|X,θ)=N(θTX,σ2) where σ is the standard deviation. Further, by assuming the prior p(θ) to be Gaussian with zero mean, regularized linear regression can be derived from Equation (2), whose loss function is penalized by the sum of the squares of the terms in θ.

Besides Gaussian, other types of distribution assumptions are applied as well. Student-t process is used in regression to model the priors of functions (Shah et al., 2014). In many Bayesian models, priors must be given through either the training data or manual adjustment. Inaccurate priors would cause systemic errors of the models. For instance, in spam email classification by naive Bayesian classifiers, if assuming uniform distribution of spam and non-spam (50/50), then the classifier is prone to report a non-spam email as spam when applying it to real life where the ratio of spam emails is much smaller, say 2%. Conversely, assuming a 2% spam ratio, the classifier would miss spam emails when applying it to email accounts full of spam emails. Although methods have been proposed to alleviate this issue (Frank and Bouckaert, 2006; Rennie et al., 2003), classifiers would benefit from appropriate priors. Such training-testing mismatch problem also occurs in other machine learning models. To the best of our knowledge, there is no perfect solution to this problem; thus, the most efficient way is to design a distribution of training data close to test scenarios which requires knowledge from experts.

Deterministic Assumptions

Deterministic assumptions describe the properties and relations of objects. Some deterministic assumptions can also be expressed in a probabilistic way, such as variable independence. In addition to these, many are encoded in the "hypothesis space": for an unknown function f:X→Y, we use a machine learning model to approximate the target function. Hypothesis space is the set of all the possible functions. This is the set from which the algorithm determines the model which best describes the target function according to the data. For instance, in artificial neural networks, the configuration of the networks, e.g., the number of layers, activation functions, and hyperparameters, is determined priorly. Then, all combinations of weights span the hypothesis space. Through training, the optimal combination of weights is calculated. The design of networks can be integrated with human knowledge, which is elaborated in the section below.

Network Architecture

Artificial neural network has been proved to be a very powerful tool to fit data. It is extensively used in many machine learning models especially after the emergence of deep learning (LeCun et al., 2015). Although ideally neural networks could adapt to all functions, adding components more specific to the domains can boost the performance. As mentioned in Section Deterministic Assumptions, the architecture of network regulates the hypothesis space. Using a specialized network reduces the size of hypothesis space and thus generalizes better with less parameters than universal networks. Therefore, we want to devise networks targeting at data and tasks.

There have been network structures proposed for specific tasks. For instance, RNNs are applied to temporal data, such as language, speech, and time series; capsule neural networks are proposed to capture the pose and spatial relation of objects in images (Hinton et al., 2011). In the following contents, we elaborate how symmetry is used in CNNs and how to embed different knowledge via customizing the neuron connections.

Symmetry of Convolutional Neural Networks

Symmetry, in a broad sense, denotes the property of an object which stays the same after a transformation. For instance, the label and features of a dog picture remain after rotation. In CNNs, symmetry is implemented by two concepts, "invariance" and "equivariance". Invariance means the output stays the same when the input changes, i.e., f(Tx)=f(x) where x denotes the input, T denotes the transformation, and f denotes the feature mapping (e.g. convolution calculation). Equivariance means the output preserves the change of the input, i.e., f(Tx)=T′[f(x)] where T′ is another transformation that could equal T. CNNs are powerful models for sensory data and especially images. A typical CNN for image classification is composed of mostly convolution layers possibly followed by pooling layers and fully connected ones for the last few layers. Convolution layers use filters to slide through images and calculate inner products with pixels to extract features to analyze each part of images. Such weight sharing characteristic not only greatly reduces the number of parameters but also provides them with inherent translation equivariance: the convolution output of a shifted image is the same as the shifted output of the original image. Basically, the CNN relies on convolution layers for equivariance and fully connected layers for invariance.

The inherent translation equivariance is limited: there are reports showing that the confidence of correct labels would decrease dramatically even with shift unnoticeable by human eyes. It is ascribed to the aliasing caused by down-sampling (stride) commonly used on convolution layers, and a simple fix is to add a blur filter before down-sampling layers (Zhang, 2019). Other symmetry groups are also considered, such as rotation, mirror, and scale. The principle of most works is to manipulate filters since they compute faster than data transformation. A simple idea is to use symmetric filters such as circular harmonics (Worrall et al., 2017) for rotation or log-radial harmonics (Ghosh and Gupta, 2019) for scale. Such equivariance or invariance is local, that is, only each pixel-filter operation has the symmetry property while the whole layer output, composed of multiple operations, does not. A global way is to traverse the symmetry space and represent with exemplary points (control points) (Cohen and Welling, 2016). For instance, in the previous dog example, we could rotate the filter by 90, 180, and 270° to calculate the corresponding feature maps. Thus, we can approximate the equivariance of rotation. Kernel convolution can be used to control the extent of symmetry, e.g. to distinguish "6" and "9" (Gens and Domingos, 2014). Overall, although invariant layers (Kanazawa et al., 2014) can also be constructed to incorporate symmetry, equivariant feature extraction as intermediate layers is preferred in order to preserve the relative pose of local features for further layers.

Design of Neuron Connections

By utilizing the knowledge of invariance and equivariance in the transformation, the CNN preforms well in image classification especially with some designed filters. From another perspective, we can also say that the CNN includes the knowledge of graphs. Imagine a pixel in a graph, which is connected to the other pixels around it. By pooling and weights sharing, the CNN generalizes the information of the pixel and its neighbors. This kind of knowledge is also applied in a graph neural network (GNN) (Gori et al., 2005; Wu et al., 2020) where the node's state is updated based on embedded neighborhood information. In the GNN, the neighbors are not necessarily the surrounding pixels but can be defined by designers. Thus, the GNN is able to represent the network nodes as low-dimensional vectors and preserve both the network topology structure and the node content information. Furthermore, the GNN can learn more informative relationship through differential pooling (Ying et al., 2018) and the variational evaluation-maximization algorithm (Qu et al., 2019).

Not only is the knowledge of general connections between nodes used in network design but also the specific relational knowledge in connections is beneficial. Encoding the logic graph (Fu, 1993) and hierarchy relationships (Deng et al., 2014) into the architecture, such as "A∧B→C" and "a ∈ A", is also one way to build neural networks with Graph-Based Knowledge. With these encoding methods, some of the rules are learned from data and part of them are enforced by human knowledge. This idea is reflected in the cooperation of symbolic reasoning and deep learning, which is getting increasingly popular recently (Garnelo and Shanahan, 2019; Mao et al., 2019; Segler et al., 2018). Symbolic reasoning or symbolic artificial intelligence (AI) (Flasiński, 2016; Hoehndorf and Queralt-Rosinach, 2017) is a good example of purely utilizing graph-based knowledge where all the rules are applied by humans and the freedom of learning is limited, while in deep learning, the rules are automatically learned from data. Building symbolic concept into neural networks helps increase the network's interpreting ability and endow the networks with more possibilities, allowing more interactions between labels, for instance.

Aside from the knowledge of graphs, Equation-Based Knowledge can also be united with deep learning. For example, a general data-driven model can be added to a first-principle model (Zhou et al., 2017). The add-on neural networks will learn the complex dynamics which might be impossible to identify with pure physical models, e.g. learning the close-to-ground aerodynamics in the drone landing case (Shi et al., 2019). Another example in utilizing equation-based knowledge is using optimization equations in the layers through OptNet (Amos and Kolter, 2017), a network architecture which allows learning the necessary hard constraints. In these layers, the outputs are not simply linear combinations plus nonlinear activation functions but solutions (obtained by applying Karush-Kuhn-Tucker conditions) to constrained optimization problems based on previous layers.

In speech recognition, the words in a sentence are related, and the beginning of the sentence may have a huge impact on interpretation. This induces delays and requires accumulating information over time. Also, in many dynamical systems, especially those involving human reaction time, the effects of delays are important. The knowledge of delay is then introduced in the design of neural networks. For example, time-delay neural networks (Waibel et al., 1989) take information from a fixed number of delayed inputs and thus can represent the relations between sequential events. Neural networks with trainable delay (Ji et al., 2020) utilize the knowledge of delay's existence and learn the delay values from data. RNN (Graves et al., 2013; Zhang et al., 2019) is a network architecture in which neurons feedback in a similar manner to dynamical systems, where the subsequent state depends on the previous state. Through inferring the latent representations of states instead of giving label to each state, the RNN can be trained directly on text transcripts of dialogs. However, these end-to-end methods often lack constraints and domain knowledge which may lead to meaningless answers or unsafe actions. For instance, if a banking dialog system does not require the username and password before providing account information, personal accounts could be accessed by anyone. A hybrid code network (HCN) (Williams et al., 2017) is proposed to address this concern. The HCN includes four components: entity extraction module, RNN, domain-specific software, and action templates. The RNN and domain-specific software maintain the states, and the action templates can be a textual communication or an API call. This general network allows experts to express specific knowledge, achieves the same performance with less data, and retains the benefits of end-to-end training.

Data Augmentation

Machine learning, especially deep learning, is hungry for data. The problems with insufficient data include overfitting where the machine generalizes poorly and class imbalance where the machine does not learn what we want because real-world data sets only contain a small percentage of "useful" examples (Salamon and Bello, 2017). These problems can be addressed by data augmentation, a class of techniques to artificially increase the amount of data with almost no cost. The basic approaches are transforming, synthesizing, and generating data. From the perspective of knowledge, it teaches machine invariance or incorporates knowledge-based models. Some papers do not consider simulation as data augmentation, but we will discuss it here since in essence these techniques are all leveraging human knowledge to generate data. Data augmentation needs more computation than explicit ways (e.g. Section Symmetry of Convolutional Neural Networks), but it is widely used in many areas such as Image, Audio, time series (Wen et al., 2020), and NLP (Fadaee et al., 2017; Ma, 2019) owing to its flexibility, simplicity, and effectiveness. It is even mandatory in unsupervised representation learning (Chen et al., 2020; He et al., 2020).

Image

Some representative data augmentation techniques for images are illustrated in Figure 2. A fundamental approach is to apply affine transformation to geometries, i.e., cropping, rotating, scaling, shearing, and translating. Noise and blur filters can be injected for better robustness. Elastic distortions (Simard et al., 2003), designed for hand-written character recognition, are generated using a random displacement field in image space to mimic uncontrolled oscillations of muscles (Wong et al., 2016). Random erasing (Zhong et al., 2020) is analogous to dropout except that it is applied to input data instead of network architecture. Kernel filters are used to generate blurred or sharpened images. Some kernels (Kang et al., 2017) can swap the rows and columns in the windows. This idea of "swapping" is similar to another approach, mixing images. There are two ways to mix images, one is cropping and merging different parts of images (Summers and Dinneen, 2019), the other is overlapping images and averaging their pixel values (Inoue, 2018). They have both been demonstrated to improve performance, though the latter is counterintuitive.

  1. Download: Download high-res image (494KB)
  2. Download: Download full-size image

Figure 2. Illustration of Image Augmentation Techniques

Another perspective is to change the input data in color space (Shorten and Khoshgoftaar, 2019). Images are typically encoded by RGB (red, green, blue) color space, i.e. represented by three channels (matrices) indicating red, green, and blue. These values are dependent upon brightness and lightning conditions. Therefore, color space transformation, also called photometric transformation, can be applied. A quick transformation is to increase or decrease the pixel values of one or three channels by a constant. Other methods are, for instance, setting thresholds of color values and applying filters to change the color space characteristics. Besides RGB, there are other color spaces such as CMY (cyan, magenta, yellow), HSV (hue, saturation, value which denotes intensity), and CIELab. The performance varies with color spaces. For example, a study tested four color spaces in a segmentation task and found that CMY outperformed the others (Jurio et al., 2010). It is worthy to note that human judgment is important in the freedom of color transformation since some tasks are sensitive to colors, e.g. distinguishing between water and blood. In contrast to augmentation, another direction to tackle color variance is to standardize the color space such as adjusting the white balance (Afifi and Brown, 2019).

Deep learning can be used to generate data for augmentation. One way is adversarial training which consists of two or more networks with contrasting objectives. Adversarial attacking uses networks to learn augmentations to images that result in misclassifications of their rival classification networks (Goodfellow et al., 2014; Su et al., 2019). Another method is generative models, such as generative adversarial networks and variational auto-encoders. These models can generate images to increase the amount of data (Lin et al., 2018). Style Transfer (Gatys et al., 2015), best known for its artistic applications, serves as a great tool for data augmentation (Jackson et al., 2019). In addition to images in the input space, data augmentation can also be applied to the feature space, i.e., the intermediate layers of neural networks (Chawla et al., 2002; DeVries and Taylor, 2017).

Audio

There are many types of audio, such as music and speech, and accordingly many types of tasks. Although augmentation method varies with audio types and tasks, the principles are universal.

Tuning is frequently used by music lovers and professionals to play or post-process music. There is a lot of mature software for it. They can stretch time to shift pitch or change the speed without pitch shifting. More advanced tuning involves reverberation (sound reflection in a small space), echo, saturation (non-linear distortion caused by overloading), gain (signal amplitude), equalization (adjusting balance of different frequency components), compression, etc. These methods can all be used in audio data augmentation (Mignot and Peeters, 2019; Ramires and Serra, 2019). Unfavorable effects in tuning, such as noise injection and cropping, are used as well in audio data augmentation.

An interesting perspective is to convert audio to images on which augmentations are based. Audio waveforms are converted to spectrograms which represent the intensity of a given frequency (vertical axis) at a given time (horizontal axis). Then, we can modify the "image" by distorting in the horizontal direction, blocking several rows, or blocking several columns (Park, 2019). These augmentations help the networks to be robust against, respectively, time direction deformations, partial loss of frequency channels, or partial loss of temporal segments of the input audio. This idea was extended by introducing more policies such as vertical direction distortion (frequency warping), time length control, and loudness control (Hwang et al., 2020). Other techniques used in image augmentation, e.g., rotation and mixture, are attempted as well (Nanni et al., 2020).

Simulation

As humans create faster and more accurate knowledge-based models to simulate the world, using simulations to acquire a large amount of data becomes an increasingly efficient method for machine learning. The primary advantage of simulations is the ability to gather a large amount of data when doing so experimentally would be costly, time consuming, or even dangerous (Ruder, 2017b; Ruiz et al., 2018). In some cases, acquiring real-world data may even be impossible without already having some training through simulations.

One application that conveys the important role of human knowledge in simulation data is computer vision. Humans can use their visual knowledge to develop powerful visual simulations to train computers. Autonomous vehicles, for instance, can be trained by simulated scenarios. They can learn basic skills before running on the road and can also be exposed to dangerous scenes that are rare naturally. Open-source simulation environments and video games such as Grand Theft Auto V (Martinez et al., 2017) can be used to reduce the time and money required to build simulations. Besides autonomous vehicles, simulation data have been used for computer vision in unique works such as cardiac resynchronization therapy (Giffard-Roisin et al., 2018), injection molding (Tercan et al., 2018), and computerized tomography (CT) scans (Holmes et al., 2019). Each of these applications requires thorough human knowledge of the subject. Lastly, robotics is a field where simulation data are expected to play a significant role in future innovation. Training robotics in the real world is too expensive, and the equipment may be damaged (Shorten and Khoshgoftaar, 2019). By incorporating human models, simulation data can allow robotics to be trained safely and efficiently.

Improvement through future research will accelerate the adoption of this technique for more tasks. Human experience and knowledge-based models will make simulations in general more realistic and efficient. Therefore, simulation data will become even more advantageous for training. At the same time, data-driven research aims to find optimal simulations that provide the most beneficial data for training the real-world machine. For instance, reinforcement learning is used to quickly discover and converge to simulation parameters that provide the data which maximizes the accuracy of the real-world machine being trained (Ruiz et al., 2018). While data-driven methods will reduce the human knowledge necessary for controlling the simulation, they will not replace the necessity of human knowledge for developing the simulation in the first place.

Feedback and Interaction

As humans, we gain knowledge mostly from interactions with the environment. In some algorithms, the machine is designed to interact with humans. Including human-in-loop (Holzinger, 2016; Holzinger et al., 2019) can help interpret the data, promote the efficiency of learning, and enhance the performance.

A typical method that demonstrates how machines can interact with the environment, including humans and preset rules, is discussed in Section Reinforcement Learning. Knowledge can also be injected through the rewards as well. In Section Active Learning, machines may ask humans for data labeling or distribution. In Section Interactive Visual Analytics, machines interact with humans by bringing new knowledge through visualization while seeking manual tuning.

Reinforcement Learning

Reinforcement learning is a goal-directed algorithm (Sutton and Barto, 2018) which learns the optimal solution to a problem by maximizing the benefit obtained in interactions with the environment.

In reinforcement learning, the component which makes decisions is called "agent" and everything outside and influenced by the agent is "environment". At time t, the agent has some information about the environment which can be denoted as a "state" St. According to the information, it will take an action At under a policy πt(a|s), which is the probability of choosing At=a when St=s. This action will lead to the next state St+1 and a numerical reward Rt+1. The transition between two states is given by interactions with the environment and is described by the environment model. If the probability of transition between states is given, we can solve the problem by applying model-based methods. When the model is unknown, which is usually the case in real life, we can also learn the model from the interactions and use the learned model to simulate the environmental behaviors if interactions are expensive.

The agent learns the optimal policy by maximizing the accumulated reward Gt (usually in episodic problems and is called return) or average reward (in continuing problems). In general, the collected rewards can be evaluated by state value Vπs:=EπGt|St=s or action value Qπ(s,a):=Eπ[Gt|St=s,At=a]. Based on different policy evaluation approaches, the methods used in reinforcement learning can be categorized into two types. One type is to evaluate the policy by the value function Vπ(s) or Qπ(s,a). The value function is estimated through table (tabular methods) or approximator (function approximation methods). In the tabular methods, the exact values of those states or state-action pairs are stored in a table; thus, the tabular methods are usually limited by computation and memory (Kok and Vlassis, 2004). Function approximation (Xu et al., 2014) provides a way to bypass the computation and memory limit in high-dimensional problems. The states and actions are generalized into different features, then the value functions become functions of those features, and the weights in the function are learned through interactions. The other type is to evaluate the policy directly by an approximator. Apart from learning the value functions, an alternative approach in reinforcement learning is to express the policy with its own approximation, which is independent of the value function. These kinds of methods are called policy gradient methods, including actor-critic methods which learn approximations to both policy and value functions (Silver et al., 2014; Sutton et al., 2000). With these methods, the agent learns the policy directly.

Feedback from the human or the environment as a reward in reinforcement learning is essential since it is a goal-directed learning algorithm. There are many tasks for which human experience remains useful, and for those tasks, it would be efficient and preferable to obtain the knowledge from humans directly and quickly. Humans can participate in the training process of reinforcement learning in two ways, one is to indirectly shape the policy by constructing the reward function, while the other is to directly intervene with the policy during learning. In the former way, when the goal of some tasks is based on human satisfaction, we need humans to give the reward signal and ensure that the agent fulfills the goal as we expect (Knox and Stone, 2008). Including human rewards, the policy is pushed indirectly toward the optimal one under the human's definition, and the learning process is sped up (Loftin et al., 2016). Aside from giving reward manually after each action, human knowledge can also be used to design the reward function, e.g. give more positive weights to those important indicators in multi-goal problems (Hu et al., 2019). A recent work summarizes how to inject human knowledge into a tabular method with reward shaping (Rosenfeld et al., 2018). In the other way, guidelines from humans directly exist in the policy. Human feedback can modify the exploration policy of the agent and participate in the action selection mechanism (Knox and Stone, 2010, 2012). The feedback on policy can be not only a numerical number but also a label on the optimal actions (Griffith et al., 2013). By adding the label, human feedback changes the policy directly instead of influencing the policy through rewards. Recent works show that human feedback also depends on the agent's current policy which enables useful training strategies (MacGlashan et al., 2017), and involving humans in the loop of reinforcement learning gives improvement in learning performance (Lin et al., 2017).

Active Learning

Active learning, by selecting partial data to be annotated, aims to resolve the challenge that labeled data are more difficult to obtain than unlabeled data. During the training process, the learner poses "queries" such as some unlabeled instances to be labeled by an "oracle" such as a human. An example of active learning algorithm for classification is shown in Figure 3. Initially, we have some labeled data and unlabeled data. After training a model based on the labeled data, we can search for the most informative unlabeled data and query the oracle to obtain its label. Eventually, we can have an excellent classifier with only few additional labeled data. In this scenario, the learner makes a decision on the query after evaluating all the instances; thus, it is called "pool-based sampling". Other scenarios include "stream-based selective sampling" (each unlabeled data point is examined one at a time with the learner evaluating the informativeness and deciding whether to query or discard) and "query synthesis" (the learner synthesizes or creates its own instance).

  1. Download: Download high-res image (111KB)
  2. Download: Download full-size image

Figure 3. Pseudocode of an Active Learning Example

Rephrased from (Settles, 2012)

There are many ways to define how informative a data point is, namely, "query strategies" vary. As illustrated in Figure 4, effective query strategies help the trained model outperform the one trained by random sampling. Therefore, it is a major issue in active learning to apply optimal query strategy, which has the following categories (Settles, 2012): (1) uncertainty sampling, which measures the uncertainty of the model's prediction on data points; (2) query by disagreement, which trains different models and checks their differences; (3) error and variance reduction, which directly looks into the generalization error of the learner. Besides, there are many other variants, such as density or diversity methods (Settles and Craven, 2008; Yang et al., 2015), which consider the repressiveness (reflection on input distribution) of instances in uncertainty sampling, clustering-based approaches (Dasgupta and Hsu, 2008; Nguyen and Smeulders, 2004; Saito et al., 2015) which cluster unlabeled data and query the most representative instances of those clusters, and min-max framework (Hoi et al., 2009; Huang et al., 2010) which minimizes the maximum possible classification loss. More versatile methods include combining multiple criteria (Du et al., 2015; Wang et al., 2016; Yang and Loog, 2018), choosing strategies automatically (Baram et al., 2004; Ebert et al., 2012), and training models to control active learning (Bachman et al., 2018; Konyushkova et al., 2017; Pang et al., 2018).

  1. Download: Download high-res image (242KB)
  2. Download: Download full-size image

Figure 4. An Illustration of Active Learning: Choosing Data to Inquire for Better Estimation When Labeled Data Are Not Sufficient

Data shown are randomly generated from two Gaussian distributions with different means. Drawn based on the concept in (Settles, 2012).

(A) Correct labels of the binary classification problem. The line denotes the decision boundary.

(B) A model trained by random queries.

(C) A model trained by active queries.

In addition to asking the oracle to label instances, queries may seek for more advanced domain knowledge. A simple idea is to solicit information about features. For instance, besides instance-label queries, a text classifier (Raghavan et al., 2006) may query the relevance between features (words) and classes, e.g. "is 'puck' discriminate to determine whether documents are about basketball or hockey?" Then, the vectors of instances are scaled to reflect the relative importance of features. Another way is to set constraints based on high-level features (Druck et al., 2009; Small et al., 2011). The oracle may be queried on possibilities, e.g. "what is the percentage of hockey documents when the word 'puck' appears?" The learning algorithm then tries to adjust the model to match the label distributions over the unlabeled pool. Other methods to incorporate features include adjusting priors in naive Bayes models (Settles, 2011), mixing models induced from rules and data (Attenberg et al., 2010), and label propagation in graph-based learning algorithms (Sindhwani et al., 2009). Humans sometimes do a poor job in answering such questions, but it is found that specifying many imprecise features may result in better models than fewer more precise features (Mann and McCallum, 2010).

Although most of active learning work is on classification, the principals also apply to regression (Burbidge et al., 2007; Willett et al., 2006). Recent work focuses on leveraging other concepts for larger data sets (e.g., image data), such as deep learning (Gal et al., 2017), reinforcement learning (Liu et al., 2019), and adversary networks (Sinha et al., 2019).

Interactive Visual Analytics

Visual analytics (VA) is a field where information visualization helps people understand the data and concepts in order to make better decisions. One core in VA is dimension reduction which can be well addressed through machine learning (Sacha et al., 2016). Applying the interactive VA, which allows users to give feedback in the modeling-visualizing loop, will make machine learning tools more approachable in model understanding, steering, and debugging (Choo and Liu, 2018).

In the combination of machine learning and VA, human knowledge plays an indispensable role as interactions and feedback to the system (Choo et al., 2010). Interactions and feedback can happen either in the visualization part or the machine learning part. In the former part, visualization systems satisfy users' requirements through interacting with them and assisting users in having a better understanding of the data analyzed by some machine learning methods. For instance, principle component analysis (PCA) is a powerful machine learning method which transforms the data from the input space to the eigenspace and reduces the dimension of the data by choosing the most representative components. However, for many users, PCA works as a "black box" and it is difficult to decipher the relationships in the eigenspace. Interactive PCA (Jeong et al., 2009) provides an opportunity for the users to give feedback on system visualization, and these interactions are reflected immediately in other views so that the user can identify the dimension in both the eigenspace and the original input space. In the machine learning part, interactions and feedback from humans help machine learning methods to generate more satisfying and explainable results and make the results easier to visualize as well. A good example of utilizing human interactions in machine learning methods is the application in classification (Fails and Olsen, 2003; Ware et al., 2001). These interactive classifiers allow users to view, classify, and correct the classifications during training. Readers can refer to a recent comment (Holzinger et al., 2018) on explainable AI.

The mutual interaction between human knowledge and the visualization or the model is an iterative process. A better visualization leads the users to learn more practical information. After the users gain some knowledge regarding the model and data, they can utilize the knowledge to further improve the model and even the learning algorithm (Hohman et al., 2018). This iterative process has a steering effect on the model, which can be viewed as the parameter evolution in dynamical systems shown in Equation (4) (Dıaz et al., 2016; Endert et al., 2017):(Equation 4)y˙=f(y,u),v=g(y),where y is the model under the machine learning algorithm, v is the visualization of that model, and u={x,w} is the input including the new input data x and users' feedback w. The feedback w is based on users' knowledge as well as the visualization v. Training of the model is complete when the dynamical system settles down at a certain equilibrium, y∗.

Some progress has been made in this interdisciplinary area to help machine learning become more accessible. Semantic interaction (Endert et al., 2012) is an approach that enables co-reasoning between the human and the analytic models used for visualization without directly controlling the models. Users can manipulate the data during visualization, and the model is steered by their actions. In this case, interactions happen with the help of visualization and affect both model and visual results. Interactive visual tools can also be built for understanding and debugging current machine learning models (Bau et al., 2019; Karpathy et al., 2015; Zeiler and Fergus, 2014). The language model visual inspector system (Rong and Adar, 2016) is able to explore the word embedding models. It allows the users to track how the hidden layers are changing and inspect the pairs of words. The reinforcement learning VA systems, DQNViz (Wang et al., 2018) and ReLVis (Saldanha et al., 2019), allow users to gain insight about the behavior pattern between training iterations in discrete and continuous action space, respectively. As the users explore the machine learning algorithms better, they can compare different methods and adjust the model faster.

In the meanwhile, increasing the understandability of machine learning makes those algorithms more trustworthy and actionable and extends the application to more areas. An example is visualizing CNNs for autonomous driving where visualization serves as a debugging tool for real-time CNN-based systems. This is done by visualizing the regions of the input image which have the highest influence on the output (Bojarski et al., 2016). A recent paper (Endert et al., 2017) gives a summary of literature on how interactive VA is involved with the machine learning domains of dimension reduction, clustering, classification, and regression. It also shows some application domains in the field of integrating machine learning with VA, text analytics, and biological data analytics.

Parameter Initialization

In essence, all the machining learning problems are optimization problems where we minimize the error/loss or maximize the benefit/probability from an initial start point. A bad initialization may lead to a slow converging path or even a sub-optimal result. In the following sections, we will introduce some works which apply human knowledge to initialization. We introduce how an agent learns from expert behaviors in Pre-training in Reinforcement Learning and how to use trained models by Transfer Learning. Although transfer learning is not limited to initialization, we categorize it here considering that fine-tuning pre-trained models is a widely used technique of transfer learning.

Pre-training in Reinforcement Learning

Many reinforcement problems must be solved in a large state/action space, especially for the continuous state action problems (Lazaric et al., 2008). Learning in a high-dimensional space requires huge amounts of data and learning time. Also, in many optimization methods, such as gradient-descent method and Newton's method, the initialization plays a critical role and may determine whether we are able to find the optimal value function/policy. Thus, equipping a pre-trained function as initialization in reinforcement learning becomes popular these days. Supervised learning is often used at the beginning stage of reinforcement learning, and the domain knowledge from humans is embedded in this way of initialization.

Pre-training can be applied to policy learning, value function learning, environment model learning, or a combination of them. In policy learning, humans can act as a trainer to teach agents the target parameterized policy through demonstration. A study provides a comprehensive survey of many different approaches to learning from demonstrations (Argall et al., 2009), and these approaches allow the agent to have a good initial policy before fine-tuning with interactive feedback. In the training of AlphaGO with DNNs and tree search (Silver et al., 2016), a supervised learning policy network is pre-trained directly from human expert moves. Then, a reinforce learning policy is trained to improve it by optimizing the final outcomes. In robot navigation, reinforce learning is capable of learning the fuzzy rules automatically but suffers from a heavy learning phase and insufficiently learned rules. Including supervised learning results as initialization in value function learning helps solve the issue and becomes one of the main approaches for this problem (Fathinezhad et al., 2016; Navarro-Guerrero et al., 2012; Ye et al., 2003). Learning a model of state dynamics can result in a pre-trained hidden layer structure that reduces the training time in reinforce learning problems (Anderson et al., 2015), and learning the deep Q networks from human demonstrators also helps to give a relatively good initial model and predict the dynamics (Gabriel et al., 2019). There are many other applications of smart initialization on policy gradient methods (Yun et al., 2017) and Q-learning methods (Burkov and Chaib-Draa, 2007; Song et al., 2012), which speed up the learning and level up the performance (Finn et al., 2016).

Transfer Learning

Transfer learning is where knowledge is learned from a "source task" and applied to a "target task" (Pan and Yang, 2009). This method is inherently dependent on human knowledge to determine suitable source tasks to transfer. The transferred knowledge can be data, neural networks, weights, etc. The ideal situation to use transfer learning is when the source task has more data available than the target task. However, transfer learning can yield benefits even when the source task does not have as much data. As shown in Figure 5, traditionally in machine learning, the data is specific to the task being trained. By transferring knowledge from the already trained source task, less data specific to the target task are needed and the training time is reduced. The context provided from the source task increases the initial performance, rate of improvement, and final performance (Brownlee, 2017) while reducing the training time and the data needed. This efficiency has led to significant commercial use of transfer learning (Ruder, 2017b).

  1. Download: Download high-res image (190KB)
  2. Download: Download full-size image

Figure 5. Illustration of Traditional Machine Learning and Transfer Learning

(A) Tasks in traditional machine learning do not share knowledge.

(B) Tasks in transfer learning share knowledge. Target task can reuse the knowledge of source tasks.

Drawn based on the concept in (Pan and Yang, 2009).

The role of human knowledge in transfer learning can be understood through the NLP problem of training a DNN for children's automatic speech recognition (ASR) (Shivakumar and Georgiou, 2020). There is a plethora of general speech data available, yet there is a lack of data specific to child speech recognition. It would be costly and time consuming to acquire data to train a child ASR algorithm from scratch. Instead, the data, neural networks, and weights from general ASR training can be transferred. Then, the weights can be fine-tuned for child ASR by using the limited amount of data specific to child ASR. The machine cannot know which source tasks would be beneficial to train the target task, so human knowledge is required to determine compatible and effective source tasks. Transfer learning is also used for image recognition where a human finds a source task to quickly train the target task before fine-tuning it with new visual data. This has been used in a wide variety of applications such as plant identification (Ghazi et al., 2017), structural damage detection (Gao and Mosalam, 2018; Gopalakrishnan et al., 2017), human behavior recognition (Kaya et al., 2017; Sargano et al., 2017), and even medical research (Burlina et al., 2017; Christodoulidis et al., 2016; Karri et al., 2017).

New methods of incorporating human knowledge with transfer learning are being researched to achieve higher efficiency and broader application. One of them is using simulation data. As we have mentioned in Section Simulation, machine learning can be trained with simulations before being fine-tuned with real-world data.

Another method that benefits from human knowledge is "heterogeneous transfer learning" (Day and Khoshgoftaar, 2017). Most commercial transfer learning methods are homogeneous, which means that the source and target tasks have the same feature space. For instance, in computer vision, the simulation looks different from the real world but the feature space is the same since they are both inputting pixels. In heterogeneous transfer learning, the feature spaces are different. For instance, the inputs are texts of two different languages, or one input is texts while the other is images. Heterogeneous transfer, by correlating different feature spaces, allows the source task to come from a wider variety of data. Extensive human knowledge on the subject is required to correlate the feature spaces effectively. The complexity of this correlation makes it difficult to scale heterogeneous transfer learning for broad use because each application of heterogeneous transfer learning requires a different subject of human knowledge. Other heterogeneous transfer learning techniques are being attempted to solve this problem, such as deep semantic mapping (Zhao et al., 2019) and hybrid heterogeneous transfer learning (Zhou et al., 2014, 2019).

Lastly, the importance of human knowledge for transfer learning is apparent by the poor performance that can occur without it. Sometimes the performance may actually be worse than if the target task was trained alone. This is known as negative transfer and occurs when the source tasks are poorly suited for the target tasks (Wang et al., 2019). It appears in human learning as well, e.g., learning to throw a baseball may be harder after learning to throw a football due to muscle memory making it difficult to adapt to a new throwing motion. Currently, preventing negative transfer requires effective human intuition or experience. Research is conducted to develop methods that will quantitatively eliminate negative performance, including using a discriminator gate to assign different weights to each source task (Wang et al., 2019) and using an iterative method that detects the source of the negative transfer to reduce class noise (Gui et al., 2018).

Conclusions

In this paper, we give a comprehensive review on integrating human knowledge into machine learning. Knowledge is categorized into general knowledge and domain knowledge, and its representations are introduced together with the works that leverage them. We focus on some new and popular topics and group the methods by their major contribution to the machine learning pipeline. In conclusion, based on existing methods, we propose the following suggestions on improving the machine learning performance with knowledge:

  • 1 Devise the inputs and outputs of models to make better use of resources. Aggregate tasks to learn together by Multitask Learning if they share data or information; an auxiliary task can be attempted even if only a single task is important. Use Features that could best represent the essence of tasks; the features can be manually engineered, selected by statistic metrics, or automatically learned by machine learning models.

  • 2 Examine model assumptions to capture major factors. Set Variable Relation such as independence based on prior knowledge. The Distribution of unknown variables in models can be obtained by empirical data, expert intuition or Gaussian. Try to match the distribution of training data with test scenarios.

  • 3 If using neural networks, tailor the architecture to be suitable for the tasks. If possible, incorporate some known properties, such as Symmetry of Convolutional Neural Networks. Logic, equations, and temporal nature can be, respectively, reflected in the structure of networks by, for instance, combining with symbolic AI, designing special layers/architectures, and using RNNs.

  • 4 Augment data to incorporate invariant properties or knowledge-based models. Augmentation can be done by transforming, manipulating, or synthesizing the original data. Image and Audio data have been discussed in details, and their augmentation share similar principles. Simulations built upon knowledge-based models can be used to generate data.

  • 5 Design algorithms to include humans in the loop. The interaction between machine and environment can be modeled and optimized in Reinforcement Learning . Humans can be asked to label data or provide distribution (Active Learning). Interactive Visual Analytics can be used to help humans understand machine learning results and then adjust models during or after training.

  • 6 Find better initialization to reflect known results. This could be achieved by learning from expert behaviors before allowing machine to automatically learn from the environment (Pre-training in Reinforcement Learning). Transfer Learning can be used to distill knowledge from relevant tasks.

For future works, we highlight the following directions:

  • 1 Models are dedicated and specific to tasks rather than universal. We witnessed the emergence of CNNs for images and RNNs for natural language. They are intuitively and empirically better than fully connected networks. General knowledge inspires us to leverage math and brains to propose more efficient mechanisms, such as attention (Vaswani et al., 2017). Domain knowledge captures the nature of the tasks, and more customized components can be incorporated.

  • 2 More nodes are added to end-to-end learning for human interaction, feedback, and intervention. Despite convenient data preparation, the black box approach of end-to-end learning makes it difficult to explain and control. We can regulate the intermediate results or network layers (Zhang et al., 2018) to produce models more understandable and controllable by humans.

  • 3 Existing results are reused for new targets. Humans can use the skills and insights across multiple tasks and even disciplines; some abilities are innate. Similarly, machines do not need to be trained from scratch. Given a new task, we can transfer or distill the knowledge from previous tasks or models.

  • 4 Higher level features such as conceptual understanding and math theorems are incorporated. Currently, the knowledge integrated in machine learning is relatively concrete and mostly at the instance level, e.g. expressing each theorem as a constraint. Despite the efforts and achievements to make knowledge generic and broad, we have not seen a successful model to grasp abstract concepts or systematic theories. We believe integrating higher level features is an essential path toward strong artificial intelligence and would change the paradigm to integrate knowledge.

Designing and implementing machine learning algorithms is an iterative process. This requires humans to analyze the models and knowledge integration to take advantage of human understanding of the real world. This review may help current and prospective users of machine learning to understand these fields and inspire them to build more efficient models.

Acknowledgments

We gratefully acknowledge the support by the Ford Motor Company.

Author Contributions

X.J. drafted Design of Neuron Connections, Reinforcement Learning, Interactive Visual Analytics, and Pre-training in Reinforcement Learning. C.R. drafted Multitask Learning, Simulation, and Transfer Learning. J.Z. drafted Qualitative Domain Knowledge and Quantitative Domain Knowledge. C.D. drafted other sections and edited the manuscript. W.L. supervised this review and revised the manuscript.

References

  1. Abdelaziz et al., 2017

    I. Abdelaziz, A. Fokoue, O. Hassanzadeh, P. Zhang, M. Sadoghi

    Large-scale structural and textual similarity-based mining of knowledge graph to predict drug--drug interactions

    J. Web Semant., 44 (2017), pp. 104-117

    View PDFView articleView in ScopusGoogle Scholar

  2. Adam-Bourdarios et al., 2015

    C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B. Kégl, D. Rousseau

    The Higgs boson machine learning challenge

    NIPS 2014 Workshop on High-Energy Physics and Machine Learning (2015), pp. 19-55

    Google Scholar

  3. Afifi and Brown, 2019

    M. Afifi, M.S. Brown

    What else can fool deep learning? Addressing color constancy errors on deep neural network performance

    Proceedings of the IEEE International Conference on Computer Vision, IEEE (2019), pp. 243-252

    CrossrefView in ScopusGoogle Scholar

  4. Aha and Bankert, 1996

    D.W. Aha, R.L. Bankert

    A comparative evaluation of sequential feature selection algorithms

    Learning from Data, Springer (1996), pp. 199-206

    CrossrefGoogle Scholar

  5. Amos and Kolter, 2017

    B. Amos, J.Z. Kolter

    Optnet: differentiable optimization as a layer in neural networks

    arXiv (2017)

    arXiv:1703.00443

    Google Scholar

  6. Anderson et al., 2015

    C.W. Anderson, M. Lee, D.L. Elliott

    Faster reinforcement learning after pretraining deep networks to predict state dynamics

    2015 International Joint Conference on Neural Networks, IEEE (2015), pp. 1-7

    CrossrefGoogle Scholar

  7. Argall et al., 2009

    B.D. Argall, S. Chernova, M. Veloso, B. Browning

    A survey of robot learning from demonstration

    Rob. Auton. Syst., 57 (2009), pp. 469-483

    View PDFView articleView in ScopusGoogle Scholar

  8. Attenberg et al., 2010

    J. Attenberg, P. Melville, F. Provost

    A unified approach to active dual supervision for labeling features and examples

    Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer (2010), pp. 40-55

    CrossrefView in ScopusGoogle Scholar

  9. Bachman et al., 2018

    P. Bachman, A. Sordoni, A. Trischler

    Learning algorithms for active learning

    Proceedings of Machine Learning Research (2018)

    arXiv:1708.00088

    Google Scholar

  10. Bahnsen et al., 2016

    A.C. Bahnsen, D. Aouada, A. Stojanovic, B. Ottersten

    Feature engineering strategies for credit card fraud detection

    Expert Syst. Appl., 51 (2016), pp. 134-142

    Google Scholar

  11. Bair and Tibshirani, 2004

    E. Bair, R. Tibshirani

    Semi-supervised methods to predict patient survival from gene expression data

    Plos Biol., 2 (2004), p. e108

    Google Scholar

  12. Baldi, 2012

    P. Baldi

    Autoencoders, unsupervised learning, and deep architectures

    Proceedings of ICML Workshop on Unsupervised and Transfer Learning (2012), pp. 37-49

    Google Scholar

  13. Baram et al., 2004

    Y. Baram, R.E. Yaniv, K. Luz

    Online choice of active learning algorithms

    J. Mach. Learn. Res., 5 (2004), pp. 255-291

    View in ScopusGoogle Scholar

  14. Barshan et al., 2011

    E. Barshan, A. Ghodsi, Z. Azimifar, M.Z. Jahromi

    Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds

    Pattern Recognit, 44 (2011), pp. 1357-1371

    View PDFView articleView in ScopusGoogle Scholar

  15. Bartolozzi and Indiveri, 2007

    C. Bartolozzi, G. Indiveri

    Synaptic dynamics in analog VLSI

    Neural Comput., 19 (2007), pp. 2581-2603

    CrossrefView in ScopusGoogle Scholar

  16. Bau et al., 2019

    D. Bau, J.-Y. Zhu, H. Strobelt, B. Zhou, J.B. Tenenbaum, W.T. Freeman, A. Torralba

    Visualizing and understanding generative adversarial networks

    arXiv (2019)

    arXiv:1901.09887

    Google Scholar

  17. Belkin and Niyogi, 2003

    M. Belkin, P. Niyogi

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput., 15 (2003), pp. 1373-1396

    View in ScopusGoogle Scholar

  18. Bengio et al., 2013

    Y. Bengio, A.C. Courville, P. Vincent

    Representation learning: a review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013), pp. 1798-1828

    View in ScopusGoogle Scholar

  19. Benjamin et al., 2014

    B.V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A.R. Chandrasekaran, J.-M. Bussat, R. Alvarez-Icaza, J.V. Arthur, P.A. Merolla, K. Boahen

    Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations

    Proc. IEEE., 102 (2014), pp. 699-716

    View in ScopusGoogle Scholar

  20. Bojarski et al., 2016

    M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, U. Muller, K. Zieba

    Visualbackprop: efficient visualization of CNNS

    arXiv (2016)

    arXiv:1611.05418

    Google Scholar

  21. Boluki et al., 2017

    S. Boluki, M.S. Esfahani, X. Qian, E.R. Dougherty

    Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors

    BMC Bioinformatics, 18 (2017), p. 552

    View in ScopusGoogle Scholar

  22. Brause et al., 1999

    R. Brause, T. Langsdorf, M. Hepp

    Neural data mining for credit card fraud detection

    Proceedings of the 11th International Conference on Tools with Artificial Intelligence, IEEE) (1999), pp. 103-106

    View in ScopusGoogle Scholar

  23. Brownlee, 2014

    J. Brownlee

    Discover Feature Engineering, How to Engineer Features and How to Get Good at it

    (2014)

    https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

    Google Scholar

  24. Brownlee, 2017

    J. Brownlee

    A Gentle Introduction to Transfer Learning for Deep Learning

    (2017)

    https://machinelearningmastery.com/transfer-learning-for-deep-learning/

    Google Scholar

  25. Burbidge et al., 2007

    R. Burbidge, J.J. Rowland, R.D. King

    Active learning for regression based on query by committee

    International Conference on Intelligent Data Engineering and Automated Learning, Springer (2007), pp. 209-218

    CrossrefView in ScopusGoogle Scholar

  26. Burkov and Chaib-Draa, 2007

    A. Burkov, B. Chaib-Draa

    Reducing the complexity of multiagent reinforcement learning

    Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (2007), pp. 1-3

    CrossrefGoogle Scholar

  27. Burlina et al., 2017

    P. Burlina, K.D. Pacheco, N. Joshi, D.E. Freund, N.M. Bressler

    Comparing humans and deep learning performance for grading AMD: a study in using universal deep features and transfer learning for automated AMD analysis

    Comput. Biol. Med., 82 (2017), pp. 80-86

    View PDFView articleView in ScopusGoogle Scholar

  28. Cao et al., 2019

    Y. Cao, X. Wang, X. He, Z. Hu, T.-S. Chua

    Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences

    The World Wide Web Conference (2019), pp. 151-161

    CrossrefView in ScopusGoogle Scholar

  29. Chandrashekar and Sahin, 2014

    G. Chandrashekar, F. Sahin

    A survey on feature selection methods

    Comput. Electr. Eng., 40 (2014), pp. 16-28

    View PDFView articleView in ScopusGoogle Scholar

  30. Chawla et al., 2002

    N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer

    SMOTE: synthetic minority over-sampling technique

    J. Artif. Intell. Res., 16 (2002), pp. 321-357

    CrossrefView in ScopusGoogle Scholar

  31. Chen et al., 2010

    C. Chen, L. Zhang, J. Bu, C. Wang, W. Chen

    Constrained Laplacian eigenmap for dimensionality reduction

    Neurocomputing, 73 (2010), pp. 951-958

    View PDFView articleView in ScopusGoogle Scholar

  32. Chen et al., 2018

    N.C. Chen, M. Drouhard, R. Kocielnik, J. Suh, C.R. Aragon

    Using machine learning to support qualitative coding in social science: shifting the focus to ambiguity

    ACM TiiS, 8 (2018), pp. 1-20

    CrossrefGoogle Scholar

  33. Chen et al., 2020

    T. Chen, S. Kornblith, M. Norouzi, G. Hinton

    A simple framework for contrastive learning of visual representations

    arXiv (2020)

    arXiv:2002.05709

    Google Scholar

  34. Choo and Liu, 2018

    J. Choo, S. Liu

    Visual analytics for explainable deep learning

    IEEE Comput. Graph Appl., 38 (2018), pp. 84-92

    CrossrefView in ScopusGoogle Scholar

  35. Choo et al., 2010

    J. Choo, H. Lee, J. Kihm, H. Park

    iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction

    2010 IEEE Symposium on Visual Analytics Science and Technology, IEEE (2010), pp. 27-34

    CrossrefView in ScopusGoogle Scholar

  36. Christodoulidis et al., 2016

    S. Christodoulidis, M. Anthimopoulos, L. Ebner, A. Christe, S. Mougiakakou

    Multisource transfer learning with convolutional neural networks for lung pattern analysis

    IEEE J. Biomed. Health Inform., 21 (2016), pp. 76-84

    Google Scholar

  37. Cohen and Welling, 2016

    T. Cohen, M. Welling

    Group equivariant convolutional networks

    International Conference on Machine Learning (2016), pp. 2990-2999

    Google Scholar

  38. Crowston et al., 2012

    K. Crowston, E.E. Allen, R. Heckman

    Using natural language processing technology for qualitative data analysis

    Int. J. Soc. Res. Methodol., 15 (2012), pp. 523-543

    CrossrefView in ScopusGoogle Scholar

  39. Daniušis et al., 2016

    P. Daniušis, P. Vaitkus, L. Petkevičius

    Hilbert--Schmidt component analysis

    Proc. Lith. Math. Soc. Ser. A., 57 (2016), pp. 7-11

    Google Scholar

  40. Dasgupta and Hsu, 2008

    S. Dasgupta, D. Hsu

    Hierarchical sampling for active learning

    Proceedings of the 25th International Conference on Machine Learning (2008), pp. 208-215

    CrossrefView in ScopusGoogle Scholar

  41. Dash and Liu, 2003

    M. Dash, H. Liu

    Consistency-based search in feature selection

    Artif. Intell., 151 (2003), pp. 155-176

    View PDFView articleView in ScopusGoogle Scholar

  42. Day and Khoshgoftaar, 2017

    O. Day, T.M. Khoshgoftaar

    A survey on heterogeneous transfer learning

    J. Big Data, 4 (2017), p. 29

    View in ScopusGoogle Scholar

  43. DeBrusk, 2018

    C. DeBrusk

    The Risk of Machine-Learning Bias (And How to Prevent it)

    MIT Sloan Management Review (2018)

    Google Scholar

  44. Davies et al., 2018

    M. Davies, N. Srinivasa, T.H. Lin, G. Chinya, Y. Cao, S.H. Chody, G. Dimou, P. Joshi, N. Imam, S. Jain, et al.

    Loihi: a neuromorphic manycore processor with on-chip learning

    IEEE Micro, 38 (2018), pp. 82-99

    CrossrefView in ScopusGoogle Scholar

  45. Deng et al., 2020

    C. Deng, Y. Wang, C. Qin, W. Lu

    Self-directed online machine learning for topology optimization

    arXiv (2020)

    arXiv:2002.01927

    Google Scholar

  46. Deng et al., 2014

    J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, H. Adam

    Large-scale object classification using label relation graphs

    European Conference on Computer Vision, Springer (2014), pp. 48-64

    CrossrefView in ScopusGoogle Scholar

  47. Deshmukh et al., 2005

    A. Deshmukh, J. Morghade, A. Khera, P. Bajaj

    Binary neural networks--a CMOS design approach

    International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer (2005), pp. 1291-1296

    CrossrefView in ScopusGoogle Scholar

  48. DeVries and Taylor, 2017

    T. DeVries, G.W. Taylor

    Dataset augmentation in feature space

    arXiv (2017)

    arXiv:1702.05538

    Google Scholar

  49. Dhanjal et al., 2008

    C. Dhanjal, S.R. Gunn, J. Shawe-Taylor

    Efficient sparse kernel feature extraction based on partial least squares

    IEEE Trans. Pattern Anal. Mach. Intell., 31 (2008), pp. 1347-1361

    Google Scholar

  50. Dıaz et al., 2016

    I. Dıaz, A.A. Cuadrado, M. Verleysen

    A statespace model on interactive dimensionality reduction

    24th European Symposium on Artificial Neural Networks (2016), pp. 647-652

    View in ScopusGoogle Scholar

  51. Drachman, 2005

    D.A. Drachman

    Do we have brain to spare?

    Neurology, 64 (2005), pp. 2004-2005

    View in ScopusGoogle Scholar

  52. Druck et al., 2009

    G. Druck, B. Settles, A. McCallum

    Active learning by labeling features

    Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009), pp. 81-90

    CrossrefView in ScopusGoogle Scholar

  53. Du et al., 2015

    B. Du, Z. Wang, L. Zhang, L. Zhang, W. Liu, J. Shen, D. Tao

    Exploring representativeness and informativeness for active learning

    IEEE Trans. Cybern., 47 (2015), pp. 14-26

    View in ScopusGoogle Scholar

  54. Ebden, 2015

    M. Ebden

    Gaussian processes: a quick introduction

    arXiv (2015)

    arXiv:1505.02965

    Google Scholar

  55. Ebert et al., 2012

    S. Ebert, M. Fritz, B. Schiele

    Ralf: a reinforced active learning formulation for object class recognition

    2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), pp. 3626-3633

    CrossrefView in ScopusGoogle Scholar

  56. Ehrlich et al., 2016

    M. Ehrlich, T.J. Shields, T. Almaev, M.R. Amer

    Facial attributes classification using multi-task representation learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2016), pp. 47-55

    CrossrefGoogle Scholar

  57. Endert et al., 2012

    A. Endert, P. Fiaux, C. North

    Semantic interaction for visual text analytics

    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2012), pp. 473-482

    CrossrefView in ScopusGoogle Scholar

  58. Endert et al., 2017

    A. Endert, W. Ribarsky, C. Turkay, B.W. Wong, I. Nabney, I.D. Blanco, F. Rossi

    The state of the art in integrating machine learning into visual analytics

    Comput. Graphics Forum, 36 (2017), pp. 458-486

    CrossrefView in ScopusGoogle Scholar

  59. Ermon et al., 2014

    S. Ermon, R.L. Bras, S.K. Suram, J.M. Gregoire, C. Gomes, B. Selman, R.B.V. Dover

    Pattern decomposition with complex combinatorial constraints: application to materials discovery

    arXiv (2014)

    arXiv:1411.7441

    Google Scholar

  60. Fails and Olsen, 2003

    J.A. Fails, D.R. Olsen

    Interactive machine learning

    Proceedings of the 8th International Conference on Intelligent User Interfaces (2003), pp. 39-45

    View in ScopusGoogle Scholar

  61. Fadaee et al., 2017

    M. Fadaee, A. Bisazza, C. Monz

    Data augmentation for low-resource neural machine translation

    arXiv (2017)

    arXiv:1705.00440

    Google Scholar

  62. Farahmand et al., 2017

    A. Farahmand, S. Nabi, D.N. Nikovski

    Deep reinforcement learning for partial differential equation control

    2017 American Control Conference, IEEE (2017), pp. 3120-3127

    View in ScopusGoogle Scholar

  63. Fathinezhad et al., 2016

    F. Fathinezhad, V. Derhami, M. Rezaeian

    Supervised fuzzy reinforcement learning for robot navigation

    Appl. Soft Comput., 40 (2016), pp. 33-41

    View PDFView articleView in ScopusGoogle Scholar

  64. Fellbaum, 2012

    C. Fellbaum

    WordNet

    C. Chapelle (Ed.), The Encyclopedia of Applied Linguistics, Blackwell Publishing Ltd. (2012), pp. 1-8

    CrossrefGoogle Scholar

  65. Finn et al., 2016

    C. Finn, T. Yu, J. Fu, P. Abbeel, S. Levine

    Generalizing skills with semi-supervised reinforcement learning

    arXiv (2016)

    arXiv:1612.00429

    Google Scholar

  66. Fisher, 1936

    R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen., 7 (1936), pp. 179-188

    CrossrefGoogle Scholar

  67. Flasiński, 2016

    M. Flasiński

    Symbolic artificial intelligence

    Introduction to Artificial Intelligence, Springer (2016), pp. 15-22

    CrossrefGoogle Scholar

  68. Flores et al., 2011

    M.J. Flores, A.E. Nicholson, A. Brunskill, K.B. Korb, S. Mascaro

    Incorporating expert knowledge when learning Bayesian network structure: a medical case study

    Artif. Intell. Med., 53 (2011), pp. 181-204

    Google Scholar

  69. Fogg, 2017

    A. Fogg

    A History of Machine Learning and Deep Learning

    (2017)

    https://www.import.io/post/history-of-deep-learning/

    Google Scholar

  70. Frank and Bouckaert, 2006

    E. Frank, R.R. Bouckaert

    Naive Bayes for text classification with unbalanced classes

    European Conference on Principles of Data Mining and Knowledge Discovery, Springer (2006), pp. 503-510

    CrossrefGoogle Scholar

  71. Frohlich et al., 2003

    H. Frohlich, O. Chapelle, B. Scholkopf

    Feature selection for support vector machines by means of genetic algorithm

    Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence (2003), pp. 142-148

    View in ScopusGoogle Scholar

  72. Fu, 1993

    L.M. Fu

    Knowledge-based connectionism for revising domain theories

    IEEE Trans. Syst. Man. Cybern. Syst., 23 (1993), pp. 173-182

    View in ScopusGoogle Scholar

  73. Gabriel et al., 2019

    V. Gabriel, Y. Du, M.E. Taylor

    Pre-training with non-expert human demonstration for deep reinforcement learning

    Knowl. Eng. Rev., 34 (2019), p. e10

    Google Scholar

  74. Gal and Ghahramani, 2016

    Y. Gal, Z. Ghahramani

    Dropout as a Bayesian approximation: representing model uncertainty in deep learning

    International Conference on Machine Learning (2016), pp. 1050-1059

    Google Scholar

  75. Gal et al., 2017

    Y. Gal, R. Islam, Z. Ghahramani

    Deep Bayesian active learning with image data

    arXiv (2017)

    arXiv:1703.02910

    Google Scholar

  76. Gan et al., 2019

    J. Gan, Q. Cai, P. Galer, D. Ma, X. Chen, J. Huang, S. Bao, R. Luo

    Mapping the knowledge structure and trends of epilepsy genetics over the past decade: a co-word analysis based on medical subject headings terms

    Medicine, 98 (2019), p. e16782

    CrossrefView in ScopusGoogle Scholar

  77. Gao and Lu, 2020

    T. Gao, W. Lu

    Physical model and machine learning enabled electrolyte channel design for fast charging

    J. Electrochem. Soc., 167 (2020), p. 110519

    CrossrefView in ScopusGoogle Scholar

  78. Gao and Mosalam, 2018

    Y. Gao, K.M. Mosalam

    Deep transfer learning for image-based structural damage recognition

    Comput. Aided Civil Infrastruct. Eng., 33 (2018), pp. 748-768

    CrossrefView in ScopusGoogle Scholar

  79. Garnelo and Shanahan, 2019

    M. Garnelo, M. Shanahan

    Reconciling deep learning with symbolic artificial intelligence: representing objects and relations

    Curr. Opin. Behav. Sci., 29 (2019), pp. 17-23

    View PDFView articleView in ScopusGoogle Scholar

  80. Gatys et al., 2015

    L.A. Gatys, A.S. Ecker, M. Bethge

    A neural algorithm of artistic style

    arXiv (2015)

    arXiv:1508.06576

    Google Scholar

  81. Gens and Domingos, 2014

    R. Gens, P.M. Domingos

    Deep symmetry networks

    Advances in Neural Information Processing Systems (2014), pp. 2537-2545

    View in ScopusGoogle Scholar

  82. Ghazi et al., 2017

    M.M. Ghazi, B. Yanikoglu, E. Aptoula

    Plant identification using deep neural networks via optimization of transfer learning parameters

    Neurocomputing, 235 (2017), pp. 228-235

    Google Scholar

  83. Ghojogh et al., 2019

    B. Ghojogh, M.N. Samad, S.A. Mashhadi, T. Kapoor, W. Ali, F. Karray, M. Crowley

    Feature selection and feature extraction in pattern analysis: a literature review

    arXiv (2019)

    arXiv:1905.02845

    Google Scholar

  84. Ghosh and Gupta, 2019

    R. Ghosh, A.K. Gupta

    Scale steerable filters for locally scale-invariant convolutional neural networks

    arXiv (2019)

    arXiv:1906.03861

    Google Scholar

  85. Giffard-Roisin et al., 2018

    S. Giffard-Roisin, H. Delingette, T. Jackson, J. Webb, L. Fovargue, J. Lee, C.A. Rinaldi, R. Razavi, N. Ayache, M. Sermesant

    Transfer learning from simulations on a reference anatomy for ECGI in personalized cardiac resynchronization therapy

    IEEE. Trans. Biomed. Eng., 66 (2018), pp. 343-353

    Google Scholar

  86. Girshick, 2015

    R. Girshick

    Fast R-CNN

    Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440-1448

    CrossrefGoogle Scholar

  87. Gong et al., 2019

    T. Gong, T. Lee, C. Stephenson, V. Renduchintala, S. Padhy, A. Ndirango, G. Keskin, O.H. Elibol

    A comparison of loss weighting strategies for multi task learning in deep neural networks

    IEEE Access, 7 (2019), pp. 141627-141632

    CrossrefView in ScopusGoogle Scholar

  88. Goodfellow et al., 2014

    I.J. Goodfellow, J. Shlens, C. Szegedy

    Explaining and harnessing adversarial examples

    arXiv (2014)

    arXiv:1412.6572

    Google Scholar

  89. Gopalakrishnan et al., 2017

    K. Gopalakrishnan, S.K. Khaitan, A. Choudhary, A. Agrawal

    Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection

    Constr. Build Mater., 157 (2017), pp. 322-330

    View PDFView articleView in ScopusGoogle Scholar

  90. Gori et al., 2005

    M. Gori, G. Monfardini, F. Scarselli

    A new model for learning in graph domains

    Proceedings of 2005 IEEE International Joint Conference on Neural Networks (2005), pp. 729-734

    CrossrefView in ScopusGoogle Scholar

  91. Graves et al., 2013

    A. Graves, A. Mohamed, G. Hinton

    Speech recognition with deep recurrent neural networks

    2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 6645-6649

    View in ScopusGoogle Scholar

  92. Griffith et al., 2013

    S. Griffith, K. Subramanian, J. Scholz, C.L. Isbell, A.L. Thomaz

    Policy shaping: integrating human feedback with reinforcement learning

    C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2013), pp. 2625-2633

    Google Scholar

  93. Gui et al., 2018

    L. Gui, R. Xu, Q. Lu, J. Du, Y. Zhou

    Negative transfer detection in transductive transfer learning

    Int. J. Mach. Learn. Cybern., 9 (2018), pp. 185-197

    View in ScopusGoogle Scholar

  94. He et al., 2020

    K. He, H. Fan, Y. Wu, S. Xie, R. Girshick

    Momentum contrast for unsupervised visual representation learning

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 9729-9738

    Google Scholar

  95. He et al., 2016

    K. He, X. Zhang, S. Ren, J. Sun

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770-778

    Google Scholar

  96. Hennecke et al., 2012

    M. Hennecke, W. Frings, W. Homberg, A. Zitz, M. Knobloch, H. Böttiger

    Measuring power consumption on IBM Blue gene/P

    Comput. Sci. Res. Dev., 27 (2012), pp. 329-336

    CrossrefView in ScopusGoogle Scholar

  97. Herculano-Houzel, 2009

    S. Herculano-Houzel

    The human brain in numbers: a linearly scaled-up primate brain

    Front. Hum. Neurosci., 3 (2009), p. 31

    View in ScopusGoogle Scholar

  98. Hinton et al., 2011

    G.E. Hinton, A. Krizhevsky, S.D. Wang

    Transforming auto-encoders

    International Conference on Artificial Neural Networks, Springer (2011), pp. 44-51

    CrossrefView in ScopusGoogle Scholar

  99. Hinton et al., 2012

    G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov

    Improving neural networks by preventing co-adaptation of feature detectors

    arXiv (2012)

    arXiv:1207.0580

    Google Scholar

  100. Hochreiter and Schmidhuber, 1997

    S. Hochreiter, J. Schmidhuber

    Long short-term memory

    Neural Comput., 9 (1997), pp. 1735-1780

    CrossrefView in ScopusGoogle Scholar

  101. Hodgkin and Huxley, 1952

    A.L. Hodgkin, A.F. Huxley

    A quantitative description of membrane current and its application to conduction and excitation in nerve

    J. Physiol., 117 (1952), p. 500

    CrossrefView in ScopusGoogle Scholar

  102. Hoehndorf and Queralt-Rosinach, 2017

    R. Hoehndorf, N. Queralt-Rosinach

    Data science and symbolic AI: synergies, challenges and opportunities

    Data Sci., 1 (2017), pp. 27-38

    CrossrefView in ScopusGoogle Scholar

  103. Hohman et al., 2018

    F. Hohman, M. Kahng, R. Pienta, D.H. Chau

    Visual analytics in deep learning: an interrogative survey for the next frontiers

    IEEE Trans. Vis. Comput. Graph., 25 (2018), pp. 2674-2693

    Google Scholar

  104. Hoi et al., 2009

    S.C. Hoi, R. Jin, J. Zhu, M.R. Lyu

    Semisupervised SVM batch mode active learning with applications to image retrieval

    ACM Trans. Inf. Syst., 27 (2009), pp. 1-29

    CrossrefGoogle Scholar

  105. Holmes et al., 2019

    T.W. Holmes, K. Ma, A. Pourmorteza

    Combination of CT motion simulation and deep convolutional neural networks with transfer learning to recover Agatston scores

    15th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine, International Society for Optics and Photonics (2019), p. 110721Z

    View in ScopusGoogle Scholar

  106. Holzinger, 2016

    A. Holzinger

    Interactive machine learning for health informatics: when do we need the human-in-the-loop?

    Brain Inform., 3 (2016), pp. 119-131

    CrossrefView in ScopusGoogle Scholar

  107. Holzinger et al., 2018

    A. Holzinger, P. Kieseberg, E. Weippl, A.M. Tjoa

    Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI

    International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Springer (2018), pp. 1-8

    CrossrefView in ScopusGoogle Scholar

  108. Holzinger et al., 2019

    A. Holzinger, M. Plass, M. Kickmeier-Rust, K. Holzinger, G.C. Crişan, C.-M. Pintea, V. Palade

    Interactive machine learning: experimental evidence for the human in the algorithmic loop

    Appl. Intell., 49 (2019), pp. 2401-2414

    CrossrefView in ScopusGoogle Scholar

  109. Holzinger et al., 2013

    A. Holzinger, C. Stocker, B. Ofner, G. Prohaska, A. Brabenetz, R. Hofmann-Wellenhof

    Combining HCI, natural language processing, and knowledge discovery-potential of IBM content analytics as an assistive technology in the biomedical field

    International Workshop on Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, Springer (2013), pp. 13-24

    CrossrefView in ScopusGoogle Scholar

  110. Hu et al., 2018

    Y. Hu, A. Huber, J. Anumula, S.-C. Liu

    Overcoming the vanishing gradient problem in plain recurrent networks

    arXiv (2018)

    arXiv:1801.06105

    Google Scholar

  111. Hu et al., 2019

    Y. Hu, A. Nakhaei, M. Tomizuka, K. Fujimura

    Interaction-aware decision making with adaptive strategies under merging scenarios

    arXiv (2019)

    arXiv:1904.06025

    Google Scholar

  112. Hu et al., 2016

    Z. Hu, X. Ma, Z. Liu, E. Hovy, E. Xing

    Harnessing deep neural networks with logic rules

    arXiv (2016)

    arXiv:1603.06318

    Google Scholar

  113. Huang et al., 2010

    S.-J. Huang, R. Jin, Z.-H. Zhou

    Active learning by querying informative and representative examples

    J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2010), pp. 892-900

    CrossrefGoogle Scholar

  114. Hwang et al., 2020

    Y. Hwang, H. Cho, H. Yang, I. Oh, S.-W. Lee

    Mel-spectrogram augmentation for sequence to sequence voice conversion

    arXiv (2020)

    arXiv:2001.01401

    Google Scholar

  115. Hyvärinen, 2013

    A. Hyvärinen

    Independent component analysis: recent advances

    Philos. T. R. Soc. A., 371 (2013), p. 20110534

    CrossrefGoogle Scholar

  116. Ikebata et al., 2017

    H. Ikebata, K. Hongo, T. Isomura, R. Maezono, R. Yoshida

    Bayesian molecular design with a chemical language model

    J. Comput. Aided Mol. Des., 31 (2017), pp. 379-391

    CrossrefView in ScopusGoogle Scholar

  117. Inoue, 2018

    H. Inoue

    Data augmentation by pairing samples for images classification

    arXiv (2018)

    arXiv:1801.02929

    Google Scholar

  118. Jackson et al., 2019

    P.T. Jackson, A.A. Abarghouei, S. Bonner, T.P. Breckon, B. Obara

    Style augmentation: data augmentation via style randomization

    CVPR Workshops (2019), pp. 83-92

    View in ScopusGoogle Scholar

  119. Jeong et al., 2009

    D.H. Jeong, C. Ziemkiewicz, B. Fisher, W. Ribarsky, R. Chang

    iPCA: An Interactive System for PCA-based Visual Analytics

    Comput. Graphics Forum, 28 (2009), pp. 767-774

    CrossrefView in ScopusGoogle Scholar

  120. Ji et al., 2020

    X.A. Ji, T.G. Molnar, S.S. Avedisov, G. Orosz

    Feed-forward neural network with trainable delay

    A. Bayen, A. Jadbabaie, G.J. Pappas, P. Parrilo, B. Recht, C. Tomlin, M. Zeilinger (Eds.), Proceedings of the 2nd Conference on Learning for Dynamics and Control, PLMR (2020), pp. 127-136

    View in ScopusGoogle Scholar

  121. Jurio et al., 2010

    A. Jurio, M. Pagola, M. Galar, C. Lopez-Molina, D. Paternain

    A comparison study of different color spaces in clustering based image segmentation

    International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Springer (2010), pp. 532-541

    CrossrefView in ScopusGoogle Scholar

  122. Kanazawa et al., 2014

    A. Kanazawa, A. Sharma, D. Jacobs

    Locally scale-invariant convolutional neural networks

    arXiv (2014)

    arXiv:1412.5104

    Google Scholar

  123. Kang et al., 2017

    G. Kang, X. Dong, L. Zheng, Y. Yang

    Patchshuffle regularization

    arXiv (2017)

    arXiv:1707.07103

    Google Scholar

  124. Karpathy et al., 2015

    A. Karpathy, J. Johnson, L. Fei-Fei

    Visualizing and understanding recurrent networks

    arXiv (2015)

    arXiv:1506.02078

    Google Scholar

  125. Karri et al., 2017

    S.P.K. Karri, D. Chakraborty, J. Chatterjee

    Transfer learning based classification of optical coherence tomography images with diabetic macular edema and dry age-related macular degeneration

    Biomed. Opt. Express, 8 (2017), pp. 579-592

    View in ScopusGoogle Scholar

  126. Kaya et al., 2017

    H. Kaya, F. Gürpınar, A.A. Salah

    Video-based emotion recognition in the wild using deep transfer learning and score fusion

    Image Vis. Comput, 65 (2017), pp. 66-75

    View PDFView articleView in ScopusGoogle Scholar

  127. Kelley, 1960

    H.J. Kelley

    Gradient theory of optimal flight paths

    ARS J., 30 (1960), pp. 947-954

    CrossrefGoogle Scholar

  128. Knox and Stone, 2008

    W.B. Knox, P. Stone

    Tamer: training an agent manually via evaluative reinforcement

    2008 7th IEEE International Conference on Development and Learning (2008), pp. 292-297

    View in ScopusGoogle Scholar

  129. Knox and Stone, 2010

    W.B. Knox, P. Stone

    Combining manual feedback with subsequent MDP reward signals for reinforcement learning

    Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (2010), pp. 5-12

    View in ScopusGoogle Scholar

  130. Knox and Stone, 2012

    W.B. Knox, P. Stone

    Reinforcement learning from simultaneous human and MDP reward

    Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (2012), pp. 475-482

    Google Scholar

  131. Kok and Vlassis, 2004

    J.R. Kok, N. Vlassis

    Sparse tabular multiagent Q-learning

    Annual Machine Learning Conference of Belgium and the Netherlands (2004), pp. 65-71

    View in ScopusGoogle Scholar

  132. Konyushkova et al., 2017

    K. Konyushkova, R. Sznitman, P. Fua

    Learning active learning from data

    I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2017), pp. 4225-4235

    Google Scholar

  133. Kromp et al., 2016

    F. Kromp, I. Ambros, T. Weiss, D. Bogen, H. Dodig, M. Berneder, T. Gerber, S. Taschner-Mandl, P. Ambros, A. Hanbury

    Machine learning framework incorporating expert knowledge in tissue image annotation

    2016 International Conference on Pattern Recognition, IEEE (2016), pp. 343-348

    CrossrefView in ScopusGoogle Scholar

  134. Kursa and Rudnicki, 2010

    M.B. Kursa, W.R. Rudnicki

    Feature selection with the Boruta package

    J. Stat. Softw., 36 (2010), pp. 1-13

    View in ScopusGoogle Scholar

  135. Lazaric et al., 2008

    A. Lazaric, M. Restelli, A. Bonarini

    Reinforcement learning in continuous action spaces through sequential Monte Carlo methods

    J.C. Platt, D. Koller, Y. Singer, S.T. Roweis (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2008), pp. 833-840

    Google Scholar

  136. LeCun et al., 2015

    Y. LeCun, Y. Bengio, G. Hinton

    Deep learning

    Nature, 521 (2015), pp. 436-444

    CrossrefView in ScopusGoogle Scholar

  137. Lee and Kim, 2019

    K. Lee, D. Kim

    In-silico molecular binding prediction for human drug targets using deep neural multi-task learning

    Genes, 10 (2019), p. 906

    CrossrefView in ScopusGoogle Scholar

  138. Li et al., 2016

    X. Li, L. Zhao, L. Wei, M.-H. Yang, F. Wu, Y. Zhuang, H. Ling, J. Wang

    Deepsaliency: multi-task deep neural network model for salient object detection

    IEEE Trans. Image Process., 25 (2016), pp. 3919-3930

    View in ScopusGoogle Scholar

  139. Li et al., 2018

    Y. Li, M.R. Min, D. Shen, D.E. Carlson, L. Carin

    Video generation from text

    AAAI Conference on Artificial Intelligence (2018), pp. 7065-7072

    View in ScopusGoogle Scholar

  140. Liem et al., 2018

    C.C. Liem, M. Langer, A. Demetriou, A.M. Hiemstra, A.S. Wicaksana, M.P. Born, C.J. König

    Psychology meets machine learning: interdisciplinary perspectives on algorithmic job candidate screening

    H. Escalante, S. Escalera, I. Guyon, X. Baró, Y. Güçlütürk, U. Güçlü, M.V. Gerven (Eds.), Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer (2018), pp. 197-253

    CrossrefGoogle Scholar

  141. Lin et al., 2017

    Z. Lin, B. Harrison, A. Keech, M.O. Riedl

    Explore, exploit or listen: combining human feedback and policy model to speed up deep reinforcement learning in 3d worlds

    arXiv (2017)

    arXiv:1709.03969

    Google Scholar

  142. Lin et al., 2018

    Z. Lin, Y. Shi, Z. Xue

    IDSGAN: generative adversarial networks for attack generation against intrusion detection

    arXiv (2018)

    arXiv:1809.02077

    Google Scholar

  143. Ling et al., 2016

    J. Ling, R. Jones, J. Templeton

    Machine learning strategies for systems with invariance properties

    J. Comput. Phys., 318 (2016), pp. 22-35

    View PDFView articleView in ScopusGoogle Scholar

  144. Liu et al., 2019

    Z. Liu, J. Wang, S. Gong, H. Lu, D. Tao

    Deep reinforcement active learning for human-in-the-loop person re-identification

    Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 6122-6131

    Google Scholar

  145. Loftin et al., 2016

    R. Loftin, B. Peng, J. MacGlashan, M.L. Littman, M.E. Taylor, J. Huang, D.L. Roberts

    Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

    Auton. Agent Multi Agent Syst., 30 (2016), pp. 30-59

    CrossrefView in ScopusGoogle Scholar

  146. Long et al., 2017

    M. Long, Z. Cao, J. Wang, S.Y. Philip

    Learning multiple tasks with multilinear relationship networks

    I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2017), pp. 1594-1603

    Google Scholar

  147. Lowe and Barnett, 1994

    H.J. Lowe, G.O. Barnett

    Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches

    JAMA, 271 (1994), pp. 1103-1108

    CrossrefView in ScopusGoogle Scholar

  148. Ma, 2019

    E. Ma

    Data Augmentation in NLP

    (2019)

    https://towardsdatascience.com/data-augmentation-in-nlp-2801a34dfc28

    Google Scholar

  149. Maaten and Hinton, 2008

    L.van der Maaten, G. Hinton

    Visualizing data using t-SNE

    J. Mach. Learn. Res., 9 (2008), pp. 2579-2605

    View in ScopusGoogle Scholar

  150. MacGlashan et al., 2017

    J. MacGlashan, M.K. Ho, R. Loftin, B. Peng, D. Roberts, M.E. Taylor, M.L. Littman

    Interactive learning from policy-dependent human feedback

    arXiv (2017)

    arXiv:1701.06049

    Google Scholar

  151. Mafarja et al., 2017

    M.M. Mafarja, D. Eleyan, I. Jaber, A. Hammouri, S. Mirjalili

    Binary dragonfly algorithm for feature selection

    2017 International Conference on New Trends in Computing Sciences, IEEE (2017), pp. 12-17

    View in ScopusGoogle Scholar

  152. Mann and McCallum, 2010

    G.S. Mann, A. McCallum

    Generalized expectation criteria for semi-supervised learning with weakly labeled data

    J. Mach. Learn. Res., 11 (2010), pp. 955-984

    View in ScopusGoogle Scholar

  153. Mao et al., 2019

    J. Mao, C. Gan, P. Kohli, J.B. Tenenbaum, J. Wu

    The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision

    arXiv (2019)

    arXiv:1904.12584

    Google Scholar

  154. Martinez et al., 2017

    M. Martinez, C. Sitawarin, K. Finch, L. Meincke, A. Yablonski, A. Kornhauser

    Beyond grand theft auto V for training, testing and enhancing deep learning in self driving cars

    arXiv (2017)

    arXiv:1712.01397

    Google Scholar

  155. Masegosa and Moral, 2013

    A.R. Masegosa, S. Moral

    An interactive approach for Bayesian network learning using domain/expert knowledge

    Int. J. Approx. Reason., 54 (2013), pp. 1168-1181

    View PDFView articleView in ScopusGoogle Scholar

  156. Mayr et al., 2016

    A. Mayr, G. Klambauer, T. Unterthiner, S. Hochreiter

    DeepTox: toxicity prediction using deep learning

    Front. Environ. Sci., 3 (2016), p. 80

    View in ScopusGoogle Scholar

  157. McCulloch and Pitts, 1943

    W.S. McCulloch, W. Pitts

    A logical calculus of the ideas immanent in nervous activity

    Bull. Math. Biol., 5 (1943), pp. 115-133

    View in ScopusGoogle Scholar

  158. Merolla et al., 2014

    P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura

    A million spiking-neuron integrated circuit with a scalable communication network and interface

    Science, 345 (2014), pp. 668-673

    CrossrefGoogle Scholar

  159. Meyerson and Miikkulainen, 2018

    E. Meyerson, R. Miikkulainen

    Pseudo-task augmentation: from deep multitask learning to intratask sharing-and back

    arXiv (2018)

    arXiv:1803.04062

    Google Scholar

  160. Mignot and Peeters, 2019

    R. Mignot, G. Peeters

    An analysis of the effect of data augmentation methods: experiments for a musical genre classification task

    Trans. Int. Soc. Music Inf. Retr., 2 (2019), pp. 97-110

    CrossrefView in ScopusGoogle Scholar

  161. Mika et al., 1999

    S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.-R. Mullers

    Fisher discriminant analysis with kernels

    Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, IEEE (1999), pp. 41-48

    View in ScopusGoogle Scholar

  162. Misra et al., 2016

    I. Misra, A. Shrivastava, A. Gupta, M. Hebert

    Cross-stitch networks for multi-task learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3994-4003

    View in ScopusGoogle Scholar

  163. Modha, 2017

    D.S. Modha

    Introducing a Brain-Inspired Computer

    (2017)

    https://www.research.ibm.com/articles/brain-chip.shtml

    Google Scholar

  164. Mor et al., 2020

    B. Mor, S. Garhwal, A. Kumar

    A systematic review of hidden markov models and their applications

    Arch. Comput. Methods Eng. (2020), 10.1007/s11831-020-09422-4

    Google Scholar

  165. Murphy, 2012

    K.P. Murphy

    Machine Learning: A Probabilistic Perspective

    MIT press (2012)

    Google Scholar

  166. Nakamura et al., 2012a

    K. Nakamura, W.-J. Kuo, F. Pegado, L. Cohen, O.J. Tzeng, S. Dehaene

    Universal brain systems for recognizing word shapes and handwriting gestures during reading

    Proc. Natl. Acad. Sci., 109 (2012), pp. 20762-20767

    CrossrefView in ScopusGoogle Scholar

  167. Nakamura et al., 2012b

    R.Y. Nakamura, L.A. Pereira, K.A. Costa, D. Rodrigues, J.P. Papa, X.-S. Yang

    BBA: a binary bat algorithm for feature selection

    25th SIBGRAPI Conference on Graphics, Patterns and Images, IEEE (2012), pp. 291-297

    CrossrefView in ScopusGoogle Scholar

  168. Nanni et al., 2020

    L. Nanni, G. Maguolo, M. Paci

    Data augmentation approaches for improving animal audio classification

    Ecol. Inform., 57 (2020), p. 101084

    View PDFView articleView in ScopusGoogle Scholar

  169. Navarro-Guerrero et al., 2012

    N. Navarro-Guerrero, C. Weber, P. Schroeter, S. Wermter

    Real-world reinforcement learning for autonomous humanoid robot docking

    Rob. Auton. Syst., 60 (2012), pp. 1400-1407

    View PDFView articleView in ScopusGoogle Scholar

  170. Nawrocki et al., 2016

    R.A. Nawrocki, R.M. Voyles, S.E. Shaheen

    A mini review of neuromorphic architectures and implementations

    IEEE Trans. Electron. Devices, 63 (2016), pp. 3819-3829

    View in ScopusGoogle Scholar

  171. Nguyen and Smeulders, 2004

    H.T. Nguyen, A. Smeulders

    Active learning using pre-clustering

    Proceedings of the 21st International Conference on Machine Learning (2004), p. 79

    Google Scholar

  172. Oord et al., 2016

    A.van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves

    Conditional image generation with pixelcnn decoders

    D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2016), pp. 4790-4798

    Google Scholar

  173. Oord et al., 2016a

    A.van den Oord, N. Kalchbrenner, K. Kavukcuoglu

    Pixel recurrent neural networks

    arXiv (2016)

    arXiv:1601.06759

    Google Scholar

  174. Pan and Yang, 2009

    S.J. Pan, Q. Yang

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng., 22 (2009), pp. 1345-1359

    CrossrefGoogle Scholar

  175. Pang et al., 2018

    K. Pang, M. Dong, Y. Wu, T. Hospedales

    Meta-learning transferable active learning policies by deep reinforcement learning

    arXiv (2018)

    arXiv:1806.04798

    Google Scholar

  176. Parish and Duraisamy, 2016

    E.J. Parish, K. Duraisamy

    A paradigm for data-driven predictive modeling using field inversion and machine learning

    J. Comput. Phys., 305 (2016), pp. 758-774

    View PDFView articleView in ScopusGoogle Scholar

  177. Park, 2019

    D.S. Park

    SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition

    (2019)

    https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html

    Google Scholar

  178. Paulheim, 2017

    H. Paulheim

    Knowledge graph refinement: a survey of approaches and evaluation methods

    Semantic web, 8 (2017), pp. 489-508

    View in ScopusGoogle Scholar

  179. Peters, 2019

    B.G. Peters

    Institutional Theory in Political Science: The New Institutionalism

    Edward Elgar Publishing (2019)

    Google Scholar

  180. Qu et al., 2019

    M. Qu, Y. Bengio, J. Tang

    GMNN: graph Markov neural networks

    arXiv (2019)

    arXiv:1905.06214

    Google Scholar

  181. Raghavan et al., 2006

    H. Raghavan, O. Madani, R. Jones

    Active learning with feedback on features and instances

    J. Mach. Learn. Res., 7 (2006), pp. 1655-1686

    View in ScopusGoogle Scholar

  182. Raissi et al., 2017

    M. Raissi, P. Perdikaris, G.E. Karniadakis

    Machine learning of linear differential equations using Gaussian processes

    J. Comput. Phys., 348 (2017), pp. 683-693

    View PDFView articleView in ScopusGoogle Scholar

  183. Ramamurthy et al., 2019

    R. Ramamurthy, C. Bauckhage, R. Sifa, J. Schücker, S. Wrobel

    Leveraging domain knowledge for reinforcement learning using MMC architectures

    International Conference on Artificial Neural Networks, Springer (2019), pp. 595-607

    CrossrefView in ScopusGoogle Scholar

  184. Ramires and Serra, 2019

    A. Ramires, X. Serra

    Data augmentation for instrument classification robust to audio effects

    arXiv (2019)

    arXiv:1907.08520

    Google Scholar

  185. Ramsundar et al., 2015

    B. Ramsundar, S. Kearnes, P. Riley, D. Webster, D. Konerding, V. Pande

    Massively multitask networks for drug discovery

    arXiv (2015)

    arXiv:1502.02072

    Google Scholar

  186. Ranjan et al., 2017

    R. Ranjan, V.M. Patel, R. Chellappa

    Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition

    IEEE Trans. Pattern Anal. Mach. Intell., 41 (2017), pp. 121-135

    Google Scholar

  187. Reed et al., 2016

    S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee

    Generative adversarial text to image synthesis

    arXiv (2016)

    arXiv:1605.05396

    Google Scholar

  188. Rennie et al., 2003

    J.D. Rennie, L. Shih, J. Teevan, D.R. Karger

    Tackling the poor assumptions of naive bayes text classifiers

    Proceedings of the 20th International Conference on Machine Learning (2003), pp. 616-623

    View in ScopusGoogle Scholar

  189. Ritter et al., 2017

    S. Ritter, D.G. Barrett, A. Santoro, M.M. Botvinick

    Cognitive psychology for deep neural networks: a shape bias case study

    arXiv (2017)

    arXiv:1706.08606

    Google Scholar

  190. Ritzer and Stepnisky, 2017

    G. Ritzer, J. Stepnisky

    Modern Sociological Theory

    Sage publications (2017)

    Google Scholar

  191. Rong and Adar, 2016

    X. Rong, E. Adar

    Visual tools for debugging neural language models

    Proceedings of ICML Workshop on Visualization for Deep Learning (2016)

    Google Scholar

  192. Rosenblatt, 1957

    F. Rosenblatt

    The Perceptron, a Perceiving and Recognizing Automaton

    Cornell Aeronautical Laboratory (1957)

    Google Scholar

  193. Rosenfeld et al., 2018

    A. Rosenfeld, M. Cohen, M.E. Taylor, S. Kraus

    Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

    Knowl. Eng. Rev., 33 (2018), p. e14

    Google Scholar

  194. Ruder, 2017a

    S. Ruder

    An overview of multi-task learning in deep neural networks

    arXiv (2017)

    arXiv:1706.05098

    Google Scholar

  195. Ruder, 2017b

    S. Ruder

    Transfer Learning-Machine Learning's Next Frontier

    (2017)

    https://ruder.io/transfer-learning/

    Google Scholar

  196. Ruder et al., 2019

    S. Ruder, J. Bingel, I. Augenstein, A. Søgaard

    Latent multi-task architecture learning

    Proceedings of the AAAI Conference on Artificial Intelligence (2019), pp. 4822-4829

    CrossrefView in ScopusGoogle Scholar

  197. Rueden et al., 2019

    L.von Rueden, S. Mayer, K. Beckh, B. Georgiev, S. Giesselbach, R. Heese, B. Kirsch, J. Pfrommer, A. Pick, R. Ramamurthy

    Informed machine learning-A taxonomy and survey of integrating knowledge into learning systems

    arXiv (2019)

    arXiv:1903.12394

    Google Scholar

  198. Ruiz et al., 2018

    N. Ruiz, S. Schulter, M. Chandraker

    Learning to simulate

    arXiv (2018)

    arXiv:1810.02513

    Google Scholar

  199. Sacha et al., 2016

    D. Sacha, L. Zhang, M. Sedlmair, J.A. Lee, J. Peltonen, D. Weiskopf, S.C. North, D.A. Keim

    Visual interaction with dimensionality reduction: a structured literature analysis

    IEEE Trans. Vis. Comput. Graph, 23 (2016), pp. 241-250

    Google Scholar

  200. Saito et al., 2015

    P.T. Saito, C.T. Suzuki, J.F. Gomes, P.J.de Rezende, A.X. Falcão

    Robust active learning for the diagnosis of parasites

    Pattern Recognit, 48 (2015), pp. 3572-3583

    View PDFView articleView in ScopusGoogle Scholar

  201. Salakhutdinov and Hinton, 2009

    R. Salakhutdinov, G. Hinton

    Deep Boltzmann machines

    D.V. Dyk, M. Welling (Eds.), Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR (2009), pp. 448-455

    Google Scholar

  202. Salamon and Bello, 2017

    J. Salamon, J.P. Bello

    Deep convolutional neural networks and data augmentation for environmental sound classification

    IEEE Signal Process. Lett., 24 (2017), pp. 279-283

    View in ScopusGoogle Scholar

  203. Saldanha et al., 2019

    E. Saldanha, B. Praggastis, T. Billow, D.L. Arendt

    ReLVis: visual analytics for situational awareness during reinforcement learning experimentation

    EuroVis (2019), pp. 43-47

    View in ScopusGoogle Scholar

  204. Samaniego et al., 2020

    E. Samaniego, C. Anitescu, S. Goswami, V.M. Nguyen-Thanh, H. Guo, K. Hamdia, X. Zhuang, T. Rabczuk

    An energy approach to the solution of partial differential equations in computational mechanics via machine learning: concepts, implementation and applications

    Comput. Methods Appl. Mech. Eng., 362 (2020), p. 112790

    View PDFView articleView in ScopusGoogle Scholar

  205. Sargano et al., 2017

    A.B. Sargano, X. Wang, P. Angelov, Z. Habib

    Human action recognition using transfer learning with deep representations

    2017 International Joint Conference on Neural Networks, IEEE (2017), pp. 463-469

    View in ScopusGoogle Scholar

  206. Segler et al., 2018

    M.H. Segler, M. Preuss, M.P. Waller

    Planning chemical syntheses with deep neural networks and symbolic AI

    Nature, 555 (2018), pp. 604-610

    CrossrefView in ScopusGoogle Scholar

  207. Senior et al., 2020

    A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland

    Improved protein structure prediction using potentials from deep learning

    Nature, 577 (2020), pp. 706-710

    CrossrefView in ScopusGoogle Scholar

  208. Settles, 2011

    B. Settles

    Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances

    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (2011), pp. 1467-1478

    View in ScopusGoogle Scholar

  209. Settles, 2012

    B. Settles

    Active learning

    Synth. Lect. Artif. Intell. Mach. Learn., 6 (2012), pp. 1-114

    Google Scholar

  210. Settles and Craven, 2008

    B. Settles, M. Craven

    An analysis of active learning strategies for sequence labeling tasks

    Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (2008), pp. 1070-1079

    CrossrefView in ScopusGoogle Scholar

  211. Shah et al., 2014

    A. Shah, A. Wilson, Z. Ghahramani

    Student-t processes as alternatives to Gaussian processes

    S. Kaski, J. Corander (Eds.), Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PLMR (2014), pp. 877-885

    View in ScopusGoogle Scholar

  212. Shental et al., 2004

    N. Shental, A. Bar-Hillel, T. Hertz, D. Weinshall

    Computing Gaussian mixture models with EM using equivalence constraints

    T. Thrun, L.K. Saul, B. Schölkopf (Eds.), Advances in Neural Information Processing Systems, MIT Press (2004), pp. 465-472

    Google Scholar

  213. Shi et al., 2019

    G. Shi, X. Shi, M. O'Connell, R. Yu, K. Azizzadenesheli, A. Anandkumar, Y. Yue, S.J. Chung

    Neural lander: stable drone landing control using learned dynamics

    2019 International Conference on Robotics and Automation, IEEE (2019), pp. 9784-9790

    CrossrefView in ScopusGoogle Scholar

  214. Shivakumar and Georgiou, 2020

    P.G. Shivakumar, P. Georgiou

    Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations

    Comput. Speech Lang., 63 (2020), p. 101077

    Google Scholar

  215. Shorten and Khoshgoftaar, 2019

    C. Shorten, T.M. Khoshgoftaar

    A survey on image data augmentation for deep learning

    J. Big Data, 6 (2019), p. 60

    View in ScopusGoogle Scholar

  216. Silver et al., 2016

    D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G.V.D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot

    Mastering the game of Go with deep neural networks and tree search

    Nature, 529 (2016), pp. 484-489

    CrossrefView in ScopusGoogle Scholar

  217. Silver et al., 2014

    D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller

    Deterministic policy gradient algorithms

    Proceedings of the 31st International Conference on Machine Learning (2014), pp. 1387-1395

    Google Scholar

  218. Silver et al., 2017

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton

    Mastering the game of go without human knowledge

    Nature, 550 (2017), pp. 354-359

    CrossrefView in ScopusGoogle Scholar

  219. Simard et al., 2003

    P.Y. Simard, D. Steinkraus, J.C. Platt

    Best practices for convolutional neural networks applied to visual document analysis

    Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003), pp. 958-963

    View in ScopusGoogle Scholar

  220. Sindhwani et al., 2009

    V. Sindhwani, P. Melville, R.D. Lawrence

    Uncertainty sampling and transductive experimental design for active dual supervision

    Proceedings of the 26th Annual International Conference on Machine Learning (2009), pp. 953-960

    CrossrefView in ScopusGoogle Scholar

  221. Sinha and Zhao, 2008

    A.P. Sinha, H. Zhao

    Incorporating domain knowledge into data mining classifiers: an application in indirect lending

    Decis. Support Syst., 46 (2008), pp. 287-299

    View PDFView articleView in ScopusGoogle Scholar

  222. Sinha et al., 2019

    S. Sinha, S. Ebrahimi, T. Darrell

    Variational adversarial active learning

    Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 5972-5981

    Google Scholar

  223. Small et al., 2011

    K. Small, B.C. Wallace, C.E. Brodley, T.A. Trikalinos

    The constrained weight space svm: learning with ranked features

    Proceedings of the 28th International Conference on International Conference on Machine Learning (2011), pp. 865-872

    View in ScopusGoogle Scholar

  224. Song et al., 2012

    Y. Song, Y. Li, C. Li, G. Zhang

    An efficient initialization approach of Q-learning for mobile robots

    Int. J. Control. Autom., 10 (2012), pp. 166-172

    CrossrefGoogle Scholar

  225. Speer et al., 2016

    R. Speer, J. Chin, C. Havasi

    Conceptnet 5.5: an open multilingual graph of general knowledge

    arXiv (2016)

    arXiv:1612.03975

    Google Scholar

  226. Srivastava et al., 2014

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov

    Dropout: a simple way to prevent neural networks from overfitting

    J. Mach. Learn. Res., 15 (2014), pp. 1929-1958

    View in ScopusGoogle Scholar

  227. Stewart and Ermon, 2017

    R. Stewart, S. Ermon

    Label-free supervision of neural networks with physics and domain knowledge

    Proceedings of the 31st AAAI Conference on Artificial Intelligence (2017), pp. 2576-2582

    View in ScopusGoogle Scholar

  228. Su et al., 2014

    C. Su, M.E. Borsuk, A. Andrew, M. Karagas

    Incorporating prior expert knowledge in learning Bayesian networks from genetic epidemiological data

    2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (2014), pp. 1-5

    View PDFView articleView in ScopusGoogle Scholar

  229. Su, 2018

    J. Su

    GAN-QP: a novel GAN framework without gradient vanishing and Lipschitz constraint

    arXiv (2018)

    arXiv:1811.07296

    Google Scholar

  230. Su et al., 2019

    J. Su, D.V. Vargas, K. Sakurai

    One pixel attack for fooling deep neural networks

    IEEE Trans. Evol. Comput., 23 (2019), pp. 828-841

    CrossrefView in ScopusGoogle Scholar

  231. Summers and Dinneen, 2019

    C. Summers, M.J. Dinneen

    Improved mixed-example data augmentation

    n 2019 IEEE Winter Conference on Applications of Computer Vision (2019), pp. 1262-1270

    CrossrefView in ScopusGoogle Scholar

  232. Sun et al., 2018

    J. Sun, Y. Fu, Q. Wan

    Organic synaptic devices for neuromorphic systems

    J. Phys. D Appl. Phys., 51 (2018), p. 314004

    CrossrefView in ScopusGoogle Scholar

  233. Sutton and Barto, 2018

    R.S. Sutton, A.G. Barto

    Reinforcement Learning: An Introduction

    MIT press (2018)

    Google Scholar

  234. Sutton et al., 2000

    R.S. Sutton, D.A. McAllester, S.P. Singh, Y. Mansour

    Policy gradient methods for reinforcement learning with function approximation

    S.A. Solla, T.K. Leen, K. Müller (Eds.), Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation (2000), pp. 1057-1063

    View in ScopusGoogle Scholar

  235. Tercan et al., 2018

    H. Tercan, A. Guajardo, J. Heinisch, T. Thiele, C. Hopmann, T. Meisen

    Transfer-learning: bridging the gap between real and simulation data for machine learning in injection molding

    Proced. CIRP, 72 (2018), pp. 185-190

    View PDFView articleView in ScopusGoogle Scholar

  236. Thomson, 2010

    A.M. Thomson

    Neocortical layer 6, a review

    Front. Neuroanat., 4 (2010), p. 13

    View in ScopusGoogle Scholar

  237. Tian et al., 2015

    N. Tian, Z. Ji, C.-H. Lai

    Simultaneous estimation of nonlinear parameters in parabolic partial differential equation using quantum-behaved particle swarm optimization with Gaussian mutation

    Int. J. Mach. Learn. Cybern., 6 (2015), pp. 307-318

    CrossrefView in ScopusGoogle Scholar

  238. Trottier et al., 2017

    L. Trottier, P. Giguere, B. Chaib-draa

    Multi-task learning by deep collaboration and application in facial landmark detection

    arXiv (2017)

    arXiv:1711.00111

    Google Scholar

  239. Tuchman et al., 2020

    Y. Tuchman, T.N. Mangoma, P. Gkoupidenis, Y.van de Burgt, R.A. John, N. Mathews, S.E. Shaheen, R. Daly, G.G. Malliaras, A. Salleo

    Organic neuromorphic devices: past, present, and future challenges

    MRS Bull., 45 (2020), pp. 619-630

    CrossrefView in ScopusGoogle Scholar

  240. Vaswani et al., 2017

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin

    Attention is all you need

    I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2017), pp. 5998-6008

    Google Scholar

  241. Waibel et al., 1989

    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K.J. Lang

    Phoneme recognition using time-delay neural networks

    IEEE Trans. Signal Process., 37 (1989), pp. 328-339

    View in ScopusGoogle Scholar

  242. Wang et al., 2018

    J. Wang, L. Gou, H.-W. Shen, H. Yang

    Dqnviz: a visual analytics approach to understand deep Q-networks

    IEEE Trans. Vis. Comput. Graph, 25 (2018), pp. 288-298

    View in ScopusGoogle Scholar

  243. Wang et al., 2019

    Z. Wang, Z. Dai, B. Póczos, J. Carbonell

    Characterizing and avoiding negative transfer

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 11293-11302

    Google Scholar

  244. Wang et al., 2016

    Z. Wang, B. Du, L. Zhang, L. Zhang

    A batch-mode active learning framework by querying discriminative and representative samples for hyperspectral image classification

    Neurocomputing, 179 (2016), pp. 88-100

    View PDFView articleGoogle Scholar

  245. Ware et al., 2001

    M. Ware, E. Frank, G. Holmes, M. Hall, I.H. Witten

    Interactive machine learning: letting users build classifiers

    Int. J. Hum. Comput., 55 (2001), pp. 281-292

    View PDFView articleView in ScopusGoogle Scholar

  246. Watkins and Dayan, 1992

    C.J. Watkins, P. Dayan

    Q-learning

    Mach. Learn., 8 (1992), pp. 279-292

    Google Scholar

  247. Weinberger and Saul, 2006

    K.Q. Weinberger, L.K. Saul

    An introduction to nonlinear dimensionality reduction by maximum variance unfolding

    AAAI Proceedings of the 21st National Conference on Artificial Intelligence (2006), pp. 1683-1686

    View in ScopusGoogle Scholar

  248. Wen et al., 2020

    Q. Wen, L. Sun, X. Song, J. Gao, X. Wang, H. Xu

    Time series data augmentation for deep learning: a survey

    arXiv (2020)

    arXiv:2002.12478

    Google Scholar

  249. Whitrow et al., 2009

    C. Whitrow, D.J. Hand, P. Juszczak, D. Weston, N.M. Adams

    Transaction aggregation as a strategy for credit card fraud detection

    Data Min. Knowl. Discov., 18 (2009), pp. 30-55

    CrossrefView in ScopusGoogle Scholar

  250. Willett et al., 2006

    R. Willett, R. Nowak, R.M. Castro

    Faster rates in regression via active learning

    Y. Weiss, B. Schölkopf, J.C. Platt (Eds.), Advances in Neural Information Processing Systems, MIT Press (2006), pp. 179-186

    Google Scholar

  251. Williams et al., 2017

    J.D. Williams, K. Asadi, G. Zweig

    Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

    arXiv (2017)

    arXiv:1702.03274

    Google Scholar

  252. Wold et al., 1987

    S. Wold, K. Esbensen, P. Geladi

    Principal component analysis

    Chemom. Intell. Lab. Syst., 2 (1987), pp. 37-52

    View PDFView articleView in ScopusGoogle Scholar

  253. Wong et al., 2016

    S.C. Wong, A. Gatt, V. Stamatescu, M.D. McDonnell

    Understanding data augmentation for classification: when to warp?

    2016 International Conference on Digital Image Computing: Techniques and Applications, IEEE (2016), pp. 1-6

    View in ScopusGoogle Scholar

  254. Worrall et al., 2017

    D.E. Worrall, S.J. Garbin, D. Turmukhambetov, G.J. Brostow

    Harmonic networks: deep translation and rotation equivariance

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 5028-5037

    Google Scholar

  255. Wu et al., 2018

    B. Wu, S. Han, K.G. Shin, W. Lu

    Application of artificial neural networks in design of lithium-ion batteries

    J. Power Sourc., 395 (2018), pp. 128-136

    View PDFView articleView in ScopusGoogle Scholar

  256. Wu et al., 2020

    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, S.Y. Philip

    A comprehensive survey on graph neural networks

    IEEE Trans. Neural Netw. Learn. Syst. (2020), 10.1109/TNNLS.2020.2978386

    Google Scholar

  257. Wu et al., 2019

    Z. Wu, C. Shen, A.V.D. Hengel

    Wider or deeper: revisiting the ResNet model for visual recognition

    Pattern Recognit, 90 (2019), pp. 119-133

    View PDFView articleView in ScopusGoogle Scholar

  258. Xu et al., 2015

    J.-G. Xu, Y. Zhao, J. Chen, C. Han

    A structure learning algorithm for Bayesian network using prior knowledge

    J. Comput. Sci. Technol., 30 (2015), pp. 713-724

    CrossrefView in ScopusGoogle Scholar

  259. Xu et al., 2014

    X. Xu, L. Zuo, Z. Huang

    Reinforcement learning algorithms with function approximation: recent advances and applications

    Inf. Sci., 261 (2014), pp. 1-31

    View PDFView articleCrossrefGoogle Scholar

  260. Yang and Loog, 2018

    Y. Yang, M. Loog

    A variance maximization criterion for active learning

    Pattern Recognit, 78 (2018), pp. 358-370

    View PDFView articleView in ScopusGoogle Scholar

  261. Yang et al., 2015

    Y. Yang, Z. Ma, F. Nie, X. Chang, A.G. Hauptmann

    Multi-class active learning by uncertainty sampling with diversity maximization

    Int. J. Comput. Vis., 113 (2015), pp. 113-127

    CrossrefView in ScopusGoogle Scholar

  262. Ye et al., 2003

    C. Ye, N.H. Yung, D. Wang

    A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance

    IEEE Trans. Syst. Man. Cybern. Syst., 33 (2003), pp. 17-27

    View in ScopusGoogle Scholar

  263. Ying et al., 2018

    Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, J. Leskovec

    Hierarchical graph representation learning with differentiable pooling

    S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Curran Associates (2018), pp. 4800-4810

    View in ScopusGoogle Scholar

  264. Yuan et al., 2016

    H. Yuan, I. Paskov, H. Paskov, A.J. González, C.S. Leslie

    Multitask learning improves prediction of cancer drug sensitivity

    Sci. Rep., 6 (2016), p. 31619

    View in ScopusGoogle Scholar

  265. Yun et al., 2017

    S. Yun, J. Choi, Y. Yoo, K. Yun, J.Y. Choi

    Action-decision networks for visual tracking with deep reinforcement learning

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 2711-2720

    View in ScopusGoogle Scholar

  266. Zeiler and Fergus, 2014

    M.D. Zeiler, R. Fergus

    Visualizing and understanding convolutional networks

    European Conference on Computer Vision, Springer (2014), pp. 818-833

    CrossrefGoogle Scholar

  267. Zhang et al., 2018

    Q. Zhang, Y.N. Wu, S.-C. Zhu

    Interpretable convolutional neural networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 8827-8836

    CrossrefView in ScopusGoogle Scholar

  268. Zhang, 2019

    R. Zhang

    Making convolutional networks shift-invariant again

    arXiv (2019)

    arXiv:1904.11486

    Google Scholar

  269. Zhang et al., 2019

    Z. Zhang, A. Kag, A. Sullivan, V. Saligrama

    Equilibrated recurrent neural network: neuronal time-delayed self-feedback improves accuracy and stability

    arXiv (2019)

    arXiv:1903.00755

    Google Scholar

  270. Zhang et al., 2014

    Z. Zhang, P. Luo, C.C. Loy, X. Tang

    Facial landmark detection by deep multi-task learning

    European Conference on Computer Vision, Springer (2014), pp. 94-108

    CrossrefView in ScopusGoogle Scholar

  271. Zhao et al., 2019

    L. Zhao, Z. Chen, L.T. Yang, M.J. Deen, Z.J. Wang

    Deep semantic mapping for heterogeneous multimedia transfer learning using co-occurrence data

    ACM Trans. Multimedia Comput. Commun. Appl., 15 (2019), pp. 1-21

    View in ScopusGoogle Scholar

  272. Zhong et al., 2020

    Z. Zhong, L. Zheng, G. Kang, S. Li, Y. Yang

    Random erasing data augmentation

    Proceedings of the 34th AAAI Conference on Artificial Intelligence (2020), pp. 13001-13008

    CrossrefView in ScopusGoogle Scholar

  273. Zhou et al., 2019

    J.T. Zhou, S.J. Pan, I.W. Tsang

    A deep learning framework for hybrid heterogeneous transfer learning

    Artif. Intell., 275 (2019), pp. 310-328

    View PDFView articleView in ScopusGoogle Scholar

  274. Zhou et al., 2014

    J.T. Zhou, S.J. Pan, I.W. Tsang, Y. Yan

    Hybrid heterogeneous transfer learning through deep learning

    Proceedings of the 38th AAAI Conference on Artificial Intelligence (2014), pp. 2213-2219

    View in ScopusGoogle Scholar

  275. Zhou et al., 2017

    S. Zhou, M.K. Helwa, A.P. Schoellig

    Design of deep neural networks as add-on blocks for improving impromptu trajectory tracking

    IEEE 56th Annual Conference on Decision and Control (2017), pp. 5201-5207

    View in ScopusGoogle Scholar

  276. Zou et al., 2019

    Z. Zou, Z. Shi, Y. Guo, J. Ye

    Object detection in 20 years: a survey

    arXiv (2019)

    arXiv:1905.05055

    Google Scholar

Cited by (137)

The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things

2023, Computer Methods and Programs in Biomedicine

Show abstract
*

Integrating human knowledge into artificial intelligence for complex and ill-structured problems: Informed artificial intelligence

2022, International Journal of Information Management

Citation Excerpt :

The extraction is mostly carried out by an expert who collects Logic Rules of knowledge through conducting interviews, mind mapping techniques, and other techniques (Bouzeghoub, Gardarin, & Metais, 1985). According to Deng C, 2020; Deng S, 2020, two classes of domain knowledge can be incorporated in AI designs: 1) Quantitative domain knowledge such as Equation based knowledge, Probability-Based knowledge, and Graph-Based knowledge, and 2) Qualitative domain knowledge such as Knowledge in Plain Language, Loosely Formed Knowledge, and Concretely Formatted Knowledge. The latter is the most accepted class because it formalizes logic rules to integrate qualitative knowledge.

Show abstract
*

Machine learning toward advanced energy storage devices and systems

2021, iScience

Citation Excerpt :

Machine learning (ML), coupled with big data, has been flourishing in recent years. Integrating human knowledge into machine learning (Deng et al., 2020) has achieved functions and performance not available before and facilitated the interaction between human beings and machine learning systems, making machine learning decisions understandable to humans. Beyond the field of computer and data sciences such as computer vision, natural language processing, image recognition, and search engine, machine learning is increasingly used in the field of physics (Carleo et al., 2019; Dunjko and Briegel, 2018), chemistry (Goh et al., 2017; Panteleev et al., 2018), biology (Silva et al., 2019; Zitnik et al., 2019), engineering (Flah et al., 2020; Kim et al., 2018; McCoy and Auret, 2019), and materials science (Morgan and Jacobs, 2020).

Show abstract
*

The General Attitudes towards Artificial Intelligence Scale (GAAIS): Confirmatory Validation and Associations with Personality, Corporate Distrust, and General Trust

2023, International Journal of Human-Computer Interaction
*

A global picture: Therapeutic perspectives for COVID-19

2022, Immunotherapy
*

XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages

相关推荐
余胜辉8 分钟前
基于COT(Chain-of-Thought Prompt)的教学应用:如何通过思维链提示提升模型推理能力
人工智能·自然语言处理·cot·模型推理·教学应用
JINGWHALE117 分钟前
设计模式 结构型 适配器模式(Adapter Pattern)与 常见技术框架应用 解析
前端·人工智能·后端·设计模式·性能优化·系统架构·适配器模式
DX_水位流量监测29 分钟前
水库水雨情监测系统:水位、雨量、流量等参数全天候实时监测
大数据·开发语言·前端·网络·人工智能·信息可视化
warren@伟_39 分钟前
Event-Based Visible and Infrared Fusion via Multi-Task Collaboration
人工智能·python·数码相机·计算机视觉
dundunmm44 分钟前
【论文阅读】SCGC : Self-supervised contrastive graph clustering
论文阅读·人工智能·算法·数据挖掘·聚类·深度聚类·图聚类
古-月1 小时前
【计算机视觉】单目深度估计模型-Depth Anything-V2
人工智能·计算机视觉
鳄鱼的眼药水3 小时前
TT100K数据集, YOLO格式, COCO格式
人工智能·python·yolo·yolov5·yolov8
台风天赋3 小时前
Large-Vision-Language-Models-LVLMs--info:deepseek-vl模型
人工智能·深度学习·机器学习·多模态大模型·deepseek
三掌柜6667 小时前
2025三掌柜赠书活动第一期:动手学深度学习(PyTorch版)
人工智能·pytorch·深度学习
唯创知音8 小时前
基于W2605C语音识别合成芯片的智能语音交互闹钟方案-AI对话享受智能生活
人工智能·单片机·物联网·生活·智能家居·语音识别