注:本文为 "机器学习的商业竞争逻辑" 相关译文。
英文引文,机翻未校。
如有内容异常,请看原文。
How to win with machine learning And how to catch up if you're lagging behind
如何借助机器学习制胜 以及落后时该如何迎头赶上
Ajay Agrawal
Professor, Rotman School of Management
阿杰伊・阿格拉沃尔
多伦多大学罗特曼管理学院 教授
Joshua Gans
Professor, Rotman School of Management
乔舒亚・甘斯
多伦多大学罗特曼管理学院 教授
Avi Goldfarb
Professor, Rotman School of Management
阿维・戈德法布
多伦多大学罗特曼管理学院 教授
ILLUSTRATOR PETER GREENWOOD
插画师 彼得・格林伍德
Harvard Business Review September--October 2020
《哈佛商业评论》2020 年 9 - 10 月刊 第 126 页
TECHNOLOGY
科技领域
THE PAST DECADE HAS BROUGHT tremendous advances in an exciting dimension of artificial intelligence-machine learning. This technique for taking data inputs and turning them into predictions has enabled tech giants such as Amazon, Apple, Facebook, and Google to dramatically improve their products. It has also spurred start-ups to launch new products and platforms, sometimes even in competition with Big Tech.
过去十年,人工智能中极具前景的机器学习领域取得了巨大进展。这种将数据输入转化为预测结果的技术,让亚马逊、苹果、脸书和谷歌等科技巨头得以大幅升级其产品,也推动初创企业推出新的产品和平台,有时这些初创企业甚至会与科技巨头展开竞争。
Consider BenchSci, a Toronto-based company that seeks to speed the drug development process. It aims to make it easier for scientists to find needles in haystacks-to zero in on the most crucial information embedded in pharma companies' internal databases and in the vast wealth of published scientific research. To get a new drug candidate into clinical trials, scientists must run costly and time-consuming experiments. BenchSci realized that scientists could conduct fewer of these-and achieve greater success-if they applied better insights from the huge number of experiments that had already been run.
以总部位于多伦多的 BenchSci 公司为例,该公司致力于加快药物研发进程。其目标是帮助科学家更轻松地在海量信息中找到关键内容 ------ 精准定位制药企业内部数据库和大量已发表科研成果中蕴含的信息。要让一款新的候选药物进入临床试验阶段,科学家必须开展成本高昂且耗时长久的实验。BenchSci 发现,若能从已完成的海量实验中提炼出更有价值的洞见,科学家就能减少实验次数,同时提升实验成功率。
Indeed, BenchSci found that if scientists took advantage of machine learning that read, classified, and then presented insights from scientific research, they could halve the number of experiments normally required to advance a drug to clinical trials. More specifically, they could use the technology to find the right biological reagents-essential substances for influencing and measuring protein expression. Identifying those by combing through the published literature rather than rediscovering them from scratch helps significantly cut the time it takes to produce new drug candidates. That adds up to potential savings of over $17 billion annually, which, in an industry where the returns to R&D have become razor-thin, could transform the market. In addition, many lives could be saved by bringing new drugs to market more quickly.
事实上,BenchSci 的研究发现,若科学家借助机器学习技术对科研成果进行读取、分类并提炼洞见,将候选药物推进至临床试验阶段所需的实验次数可减少一半。更具体地说,科学家能通过这一技术找到合适的生物试剂 ------ 这类试剂是影响和检测蛋白质表达的关键物质。通过梳理已发表的文献来确定适用试剂,而非从头探索,能大幅缩短研发新候选药物的时间。这一方式每年有望为行业节省超 170 亿美元的成本,而在医药研发回报率持续走低的当下,这笔成本节约足以重塑整个市场。此外,新药的加速上市也能挽救更多生命。
What is remarkable here is that BenchSci, in its specialized domain, is doing something akin to what Google has been doing for the whole of the internet: using machine learning to lead in search. Just as Google can help you figure out how to fix your dishwasher and save you a long trip to the library or a costly repair service, BenchSci helps scientists identify a suitable reagent without incurring the trouble or expense of excessive research and experimentation. Previously, scientists would often use Google or PubMed to search the literature (a process that took days), then read the literature (again spending days), and then order and test three to six reagents before choosing one (over a period of weeks). Now
值得关注的是,BenchSci 在其专业领域所做的事,与谷歌在整个互联网领域的举措异曲同工:借助机器学习在搜索领域占据领先地位。就像谷歌能帮你找到修复洗碗机的方法,省去你跑图书馆的周折和聘请维修人员的高额费用一样,BenchSci 能帮助科学家找到合适的试剂,避免他们进行大量无谓的研究和实验,节省时间与成本。在此之前,科学家通常会用谷歌或 PubMed 检索文献(这一过程耗时数天),随后研读文献(又要花费数天),再订购 3 至 6 种试剂进行测试,最终选定一种(整个测试过程耗时数周)。而如今,
IDEA IN BRIEF
观点简述
THE CHALLENGE
挑战
As more companies deploy machine learning for AI-enabled products and services, they face the challenge of carving out a defensible market position, especially if they are latecomers.
随着越来越多的企业将机器学习应用于人工智能产品和服务,它们亟需打造自身难以被撼动的市场地位,对于行业后来者而言,这一挑战尤为突出。
HOW TO GET AHEAD
制胜之道
The most successful AI users capture a good pool of training data early and then exploit feedback data to open up a value gap-in terms of prediction quality- between themselves and later movers.
最成功的人工智能应用者会尽早获取优质的训练数据资源,而后利用反馈数据,在预测质量上与行业后来者拉开价值差距。
HOW TO CATCH UP
迎头赶上的方法
Latecomers can still secure a foothold if they can find sources of superior training data or feedback data, or if they tailor their predictions to a specific niche.
行业后来者若能找到优质的训练数据或反馈数据来源,或是针对特定细分领域优化预测模型,仍能在市场中站稳脚跟。
they search BenchSci in minutes and then order and test one to three reagents before choosing one (conducting fewer tests over fewer weeks).
科学家只需在 BenchSci 上花费数分钟检索,再订购 1 至 3 种试剂进行测试就能确定最终选择(测试的次数和耗时都大幅减少)。
Many companies are already working with AI and are aware of the practical steps for integrating it into their operations and leveraging its power. But as that proficiency grows, companies will need to consider a broader issue: How do you take advantage of machine learning to create a defensible moat around the business-to create something that competitors can't easily imitate? In BenchSci's case, for instance, will its initial success attract competition from Google-and if so, how does BenchSci retain its lead?
如今许多企业已开始运用人工智能,也掌握了将其融入业务运营、发挥其价值的实操方法。但随着企业运用人工智能的熟练度不断提升,它们需要思考一个更宏观的问题:如何借助机器学习为企业构建难以被复制的竞争壁垒?以 BenchSci 为例,其初步的成功是否会引来谷歌的竞争?如果答案是肯定的,它又该如何保持领先地位?
In the following pages, we explain how companies entering industries with an AI-enabled product or service can build a sustainable competitive advantage and raise entry barriers against latecomers. We note that moving early can often be a big plus, but it's not the whole story. As we discuss, late adopters of the new technology can still advance-or at least recover some lost ground-by finding a niche.
在接下来的内容中,我们将阐述,凭借人工智能产品或服务入局各行业的企业,该如何打造可持续的竞争优势、提高行业后来者的进入壁垒。我们认为,先发制人往往能占据巨大优势,但这并非成功的全部。正如我们将探讨的,人工智能技术的后发企业仍可通过深耕细分领域实现发展,至少能挽回部分劣势。
MAKING PREDICTIONS WITH AI
借助人工智能实现预测
Businesses use machine learning to recognize patterns and then make predictions-about what will appeal to customers, improve operations, or help make a product better. Before you can build a strategy around such predictions, however, you must understand the inputs necessary for the prediction process, the challenges involved in getting those inputs, and the role of feedback in enabling an algorithm to make better predictions over time.
企业运用机器学习识别数据规律,进而做出各类预测 ------ 比如预测消费者的偏好、能优化运营的举措,或是能改进产品的方法。然而,要围绕这些预测制定战略,企业必须先了解预测过程所需的输入数据、获取这些数据面临的挑战,以及反馈机制在推动算法持续优化预测效果的过程中所发挥的作用。
A prediction, in the context of machine learning, is an information output that comes from entering some data and running an algorithm. For example, when your mobile navigation app serves up a prediction about the best route between two points, it uses input data on traffic conditions, speed limits, road size, and other factors. An algorithm is then employed to predict the fastest way to go and the time that will take.
在机器学习的语境中,预测是将数据输入算法后得到的信息输出结果。例如,手机导航软件为你预测两点之间的最佳路线时,会先获取交通状况、限速标准、道路规模等输入数据,再通过算法计算出最快的行驶路线和所需的时间。
The key challenge with any prediction process is that training data-the inputs you need in order to start getting reasonable outcomes-has to be either created (by, say, hiring experts to classify things) or procured from existing sources (say, health records). Some kinds of data are easy to acquire from public sources (think of weather and map information). Consumers may also willingly supply personal data if they perceive a benefit from doing so. Fitbit and Apple Watch users, for example, allow the companies to gather metrics about their exercise level, calorie intake, and so forth through devices that users wear to manage their health and fitness.
所有预测过程的挑战在于,训练数据 ------ 即想要得到合理预测结果所需的基础输入数据 ------ 要么需要企业主动构建(比如聘请专家对数据进行分类标注),要么需要从现有渠道获取(比如医疗记录)。部分数据能从公共渠道轻松获取(比如天气和地图信息)。如果消费者能感知到提供个人数据的价值,也会愿意主动分享。例如,Fitbit 和苹果手表的用户为了管理自身健康和健身状况佩戴设备,同时也允许企业通过这些设备收集自己的运动强度、卡路里摄入量等数据指标。
Obtaining training data to enable predictions can be difficult, however, if it requires the cooperation of a large number of individuals who do not directly benefit from providing it. For instance, a navigation app can collect data about traffic conditions by tracking users and getting reports from them. This allows the app to identify likely locations for traffic jams and to alert other drivers who are heading toward them. But drivers already caught in the snarls get little direct payoff from participating, and they may be troubled by the idea that the app knows where they are at any moment (and is potentially recording their movements). If people in traffic jams decline to share their data or actually switch off their geolocators, the app's ability to warn users of traffic problems will be compromised.
但如果获取预测所需的训练数据,需要大量无法从中直接获益的个人配合,那么数据获取的难度会大幅增加。例如,导航软件可通过追踪用户、收集用户反馈来获取交通状况数据,从而预判拥堵路段,并向即将驶入该路段的司机发出预警。但对于已经陷入交通拥堵的司机而言,分享数据几乎没有直接收益,他们还会因软件能实时掌握其位置(且可能记录其行驶轨迹)而感到顾虑。如果遭遇拥堵的用户拒绝分享数据,或是直接关闭定位功能,软件的交通预警能力就会大打折扣。
Another challenge may be the need to periodically update training data. This isn't always an issue; it won't apply if the basic context in which the prediction was made stays constant. Radiology, for example, analyzes human physiology, which is generally consistent from person to person and over time. Thus, after a certain point, the marginal value of an extra record in the training database is almost zero. However, in other cases algorithms may need to be frequently updated with completely new data reflecting changes in the underlying environment. With navigational apps, for instance, new roads or traffic circles, renamed streets, and similar changes will render the app's predictions less accurate over time unless the maps that form part of the initial training data are updated.
另一个挑战是,企业可能需要定期更新训练数据。这一问题并非始终存在:如果预测的基础场景保持不变,就无需频繁更新数据。例如,放射学分析的是人类的生理结构,而人类的生理结构在个体之间、不同时间维度上基本保持稳定。因此,当训练数据库的规模达到一定程度后,新增一条数据的边际价值几乎为 0。但在其他场景中,算法需要不断纳入能反映环境变化的全新数据,才能保持准确性。以导航软件为例,新道路、新环岛的修建,街道更名等变化,都会让软件的预测结果逐渐失真,除非作为初始训练数据的地图得到及时更新。
In many situations, algorithms can be continuously improved through the use of feedback data, which is obtained by mapping actual outcomes to the input data that generated predictions of those outcomes. This tool is particularly helpful in situations where there can be considerable variation within clearly defined boundaries. For instance, when your phone uses an image of you for security, you will have initially trained the phone to recognize you. But your face can change significantly. You may or may not
在许多场景中,算法能通过反馈数据实现持续优化。反馈数据的获取方式,是将实际结果与用于预测该结果的输入数据进行匹配分析。当场景的边界清晰,但内部存在大量变量时,反馈数据的作用尤为显著。例如,手机通过面部识别进行身份验证时,用户会先完成初始的面部信息录入,让手机学习识别自己的面部特征。但人的面部特征会发生较大变化:你可能戴眼镜,也可能不戴;
be wearing glasses. You may have gotten a new hairstyle, put on makeup, or gained or lost weight. Thus the prediction that you are you may become less reliable if the phone relies solely on the initial training data. But what actually happens is that the phone updates its algorithm using all the images you provide each time you unlock it.
你可能换了新发型、化了妆,也可能体重有所增减。因此,如果手机仅依靠初始训练数据进行识别,其身份验证的准确性会大幅下降。而实际情况是,每次你解锁手机时,手机都会将你提供的面部图像纳入算法训练,持续更新优化识别模型。
Creating these kinds of feedback loops is far from straightforward in dynamic contexts and where feedback cannot be easily categorized and sourced. Feedback data for the smartphone face-recognition app, for example, creates better predictions only if the sole person inputting facial data is the phone's owner. If other people look similar enough to get into the phone and continue using it, the phone's prediction that the user is the owner becomes unreliable.
在动态变化的场景中,或是当反馈数据难以分类、溯源时,构建这类反馈循环并非易事。以手机面部识别的反馈数据为例,只有当录入面部数据的人始终是手机机主时,反馈数据才能推动识别模型优化。如果有外貌与机主高度相似的人解锁并使用手机,这些错误的反馈数据会让手机的身份验证预测结果失去可靠性。
It can also be dangerously easy to introduce biases into machine learning, especially if multiple factors are in play. Suppose a lender uses an AI-enabled process to assess the credit risk of loan applicants, considering their income level, employment history, demographic characteristics, and so forth. If the training data for the algorithm discriminates against a certain group-say, people of color-the feedback loop will perpetuate or even accentuate that bias, making it increasingly likely that applicants of color are rejected. Feedback is almost impossible to incorporate safely into an algorithm without carefully defined parameters and reliable, unbiased sources.
机器学习模型也极易产生偏见,尤其是在模型涉及多个影响因素时,这种偏见的产生会带来极大风险。假设一家贷款机构运用人工智能系统评估借款人的信用风险,评估维度包括收入水平、就业经历、人口统计特征等。如果算法的训练数据本身就对某一群体存在歧视 ------ 比如有色人种,那么反馈循环会让这种偏见持续存在甚至不断加剧,导致有色人种的贷款申请被拒绝的概率越来越高。若没有明确的参数设定和可靠、无偏的数据源,反馈数据几乎无法被安全地纳入算法优化过程。
One barrier to entry is the amount of time and effort involved in creating or accessing sufficient training data to make good-enough predictions.
行业进入壁垒之一,是企业为获取足够的训练数据、实现符合商业要求的预测效果,所需投入的时间和精力。
BUILDING COMPETITIVE ADVANTAGE IN PREDICTION
打造预测领域的竞争优势
In many ways, building a sustainable business in machine learning is much like building a sustainable business in any industry. You have to come in with a sellable product, carve out a defensible early position, and make it harder for anyone to come in behind you. Whether you can do that depends on your answers to three questions:
从诸多方面来看,依托机器学习打造可持续发展的企业,与在其他行业打造可持续企业的逻辑大同小异:企业需要推出有市场需求的产品,尽早确立难以被撼动的市场地位,提高后来者的进入难度。而企业能否做到这些,取决于对以下三个问题的答案:
1 Do you have enough training data ? At the get-go, a prediction machine needs to generate predictions that are good enough to be commercially viable. The definition of "good enough" might be set by regulation (for example, an AI for making medical diagnoses must meet government standards), usability (a chatbot has to work smoothly enough for callers to respond to the machine rather than wait to speak to a human in the call center), or competition (a company seeking to enter the internet search market needs a certain level of predictive accuracy to compete with Google). One barrier to entry, therefore, is the amount of time and effort involved in creating or accessing sufficient training data to make good-enough predictions.
第一,你是否拥有足够的训练数据?从一开始,预测模型就需要输出达到商业应用标准的预测结果。所谓 "符合商业标准",其界定可能来自监管要求(例如,用于医学诊断的人工智能必须达到政府制定的标准)、产品可用性要求(例如,智能聊天机器人的交互体验必须足够流畅,让来电者愿意与其沟通,而非等待人工客服),或是行业竞争要求(例如,想要进入互联网搜索市场的企业,其算法的预测准确率必须达到一定水平,才能与谷歌展开竞争)。因此,行业的进入壁垒之一,就是企业为获取足够的训练数据、实现符合商业要求的预测效果,所需投入的时间和精力。
This barrier can be high. Take the case of radiology, where a prediction machine needs to be measurably better than highly skilled humans in order to be trusted with people's lives. That suggests that the first company to build a generally applicable AI for radiology (one that can read any scanned image) will have little competition at first because so much data is needed for success. But the initial advantage may be short-lived if the market is growing rapidly, because in a fast-growing market the payoff from having access to the training data will probably be large enough to attract multiple big companies with deep pockets.
这一壁垒的门槛可能极高。以放射学领域为例,用于该领域的预测模型要想被托付以生命健康相关的诊断工作,其准确率必须显著高于资深的人类医生。这意味着,首家研发出通用型放射学人工智能(能解读各类医学影像)的企业,初期几乎不会面临竞争,因为研发这类模型需要海量的训练数据。但如果该市场增长迅速,企业的先发优势可能转瞬即逝 ------ 因为在高速增长的市场中,掌握训练数据所能带来的巨大收益,会吸引众多资金雄厚的大企业入局。
This, of course, means that training-data entry requirements are subject to the economics of scale, like so much else. High-growth markets attract investments, and over time this raises the threshold for the next new entrant (and forces everyone already in the sector to spend more on developing or marketing their products). Thus the more data you can train your machines on, the bigger the hurdle for anyone coming after you, which brings us to the second question.
当然,这意味着训练数据的获取门槛和其他许多领域一样,受规模经济规律的影响。高速增长的市场会吸引大量投资,而随着时间推移,这些投资会不断抬高行业新进入者的门槛(同时也迫使行业内的现有企业加大产品研发和市场推广的投入)。因此,企业的模型训练数据量越大,为后来者设置的壁垒就越高。这就引出了第二个问题。
2 How fast are your feedback loops ? Prediction machines exploit what has traditionally been the human advantage-they learn. If they can incorporate feedback data, then they can learn from outcomes and improve the quality of the next prediction.
第二,你的反馈循环效率如何?预测模型的优势,是掌握了原本属于人类的能力 ------ 学习能力。如果模型能纳入反馈数据,就能从实际结果中总结经验,优化后续的预测质量。
The extent of this advantage, however, depends on the time it takes to get feedback. With a radiology scan, if an autopsy is required to assess whether a machine-learning algorithm correctly predicted cancer, then feedback will be slow, and although a company may have an early lead in collecting and reading scans, it will be limited in its ability to learn and thus sustain its lead. By contrast, if feedback data can be generated quickly after obtaining the prediction, then an early lead will translate into a sustained competitive
但这一优势的发挥程度,取决于反馈数据的获取速度。以放射学影像诊断为例,若要通过尸检才能验证机器学习算法的癌症预测结果是否准确,那么反馈数据的获取速度会极慢。即便某家企业在医学影像的收集和解读上占据先发优势,其模型的学习和优化速度也会受限,难以长期保持领先。相反,如果在做出预测后能快速获取反馈数据,那么企业的先发优势将转化为可持续的竞争优势,
advantage, because the minimum efficient scale will soon be out of the reach of even the biggest companies.
因为模型所需的最小有效规模会迅速扩大,即便资金最雄厚的企业也难以企及。
When Microsoft launched the Bing search engine in 2009, it had the company's full backing. Microsoft invested billions of dollars in it. Yet more than a decade later, Bing's market share remains far below Google's, in both search volume and search advertising revenue. One reason Bing found it hard to catch up was the feedback loop. In search, the time between the prediction (offering up a page with several suggested links in response to a query) and the feedback (the user's clicking on one of the links) is short- usually seconds. In other words, the feedback loop is fast and powerful.
2009 年,微软推出必应搜索引擎时,获得了公司的全力支持,投入了数十亿美元的资金。但十多年后,无论是搜索量还是搜索广告收入,必应的市场份额都远低于谷歌。必应难以实现赶超的原因之一,就是反馈循环的效率差距。在搜索引擎领域,从算法做出预测(根据用户的搜索关键词呈现相关链接页面)到获取反馈数据(用户点击其中某个链接)的时间极短,通常仅数秒。也就是说,搜索引擎的反馈循环效率高、效果显著。
By the time Bing entered the market, Google had already been operating an AI-based search engine for a decade or more, helping millions of users and performing billions of searches daily. Every time a user made a query, Google provided its prediction of the most relevant links, and then the user selected the best of those links, enabling Google to update its prediction model. That allowed for constant learning in light of a constantly expanding search space. With so much training data based on so many users, Google could identify new events and new trends more quickly than Bing could. In the end, the fast feedback loop, combined with other factors-Google's continued investment in massive data-processing facilities, and the real or perceived costs to customers of switching to another engine-meant that Bing always lagged. Other search engines that tried to compete with Google and Bing never even got started.
在必应入局时,谷歌的人工智能搜索引擎已运营了十多年,每天为数百万用户提供服务,处理数十亿次搜索请求。用户每发起一次搜索,谷歌就会预测并呈现最相关的链接,而用户的点击选择会成为反馈数据,推动谷歌持续更新预测模型。这让谷歌的算法能在不断扩大的搜索场景中持续学习。依托海量用户积累的训练数据,谷歌对新事件、新趋势的捕捉速度远快于必应。最终,高效的反馈循环,再加上其他因素 ------ 谷歌对大型数据处理设施的持续投入、用户切换搜索引擎的实际或主观成本 ------ 让必应始终处于落后地位。而其他试图与谷歌、必应竞争的搜索引擎,甚至从未真正入局。
3 How good are your predictions ? The success of any product ultimately depends on what you get for what you pay. If consumers are offered two similar products at the same price, they will generally choose the one they perceive to be of higher quality.
第三,你的预测质量如何?任何产品的成功,最终都取决于其性价比。如果消费者面临两款定价相同、功能相似的产品,通常会选择自己认为质量更高的那一款。
Prediction quality, as we've already noted, is often easy to assess. In radiology, search, advertising, and many other contexts, companies can design AIs with a clear, single metric for quality: accuracy. As in other industries, the highest-quality products benefit from higher demand. AI-based products are different from others, however, because for most other products, better quality costs more, and sellers of inferior goods survive by using cheaper materials or less-expensive manufacturing processes and then charging lower prices. This strategy isn't as feasible in the context of AI. Because AI is software-based, a low-quality prediction is as expensive to produce as a high-quality one, making discount pricing unrealistic. And if the better prediction is priced the same as the worse one, there is no reason to purchase the lower-quality one.
正如我们此前所说,预测质量往往易于评估。在放射学、搜索引擎、广告等诸多领域,企业能为人工智能模型设定清晰、单一的质量衡量标准:准确率。和其他行业一样,质量越高的产品,市场需求也越高。但人工智能产品与传统产品存在本质区别:对于大多数传统产品而言,质量越高,生产成本也越高,而劣质产品的商家可以通过使用廉价原材料、简化生产工艺来降低成本,进而以低价抢占市场。这一策略在人工智能领域却难以奏效。因为人工智能是基于软件的,生成低质量的预测结果与高质量的预测结果,所需的生产成本相差无几,低价策略缺乏现实基础。而如果两款人工智能产品定价相同,其中一款的预测质量更优,消费者就没有理由选择质量较差的那一款。
For Google, this is another factor explaining why its lead in search may be unassailable. Competitors' predictions often look pretty similar to Google's. Enter the word "weather" into Google or Bing, and the results will be much the same-forecasts will pop up first. But if you enter a less common term, differences may emerge. If you type in, say, "disruption," Bing's first page will usually show dictionary definitions, while Google provides both definitions and links to research papers on the topic of disruptive innovation. Although Bing can perform as well as Google for some text queries, for others it's less accurate in predicting what consumers are looking for. And there are few if any other search categories where Bing is widely seen as superior.
对谷歌而言,这是其在搜索领域的领先地位难以被撼动的另一重要原因。竞争对手的预测结果,表面上看与谷歌相差无几。在谷歌和必应中输入关键词 "天气",得到的结果基本一致 ------ 都会优先显示天气预报。但如果输入较为冷门的词汇,二者的差距就会显现。例如,输入 "颠覆性变革",必应的首页通常只会显示该词汇的词典释义,而谷歌不仅会给出释义,还会附上颠覆性创新相关的研究论文链接。尽管必应在部分文本检索中的表现能与谷歌持平,但在其他检索场景中,其对用户搜索意图的预测准确率更低,而且几乎没有哪个搜索品类中,必应的表现被普遍认为优于谷歌。
The tech giants have a head start, but if you can differentiate the contexts and purposes of your predictions even a little, you can create a defensible space for your product.
科技巨头虽占据先发优势,但只要你能在预测的场景和用途上做出些许差异化,就能为自己的产品打造出难以被撼动的市场空间。
CATCHING UP
迎头赶上
The bottom line is that in AI, an early mover can build a scale-based competitive advantage if feedback loops are fast and performance quality is clear. So what does this mean for late movers? Buried in the three questions are clues to two ways in which a late entrant can carve out its own space in the market. Would-be contenders needn't choose between these approaches; they can try both.
归根结底,在人工智能领域,如果反馈循环效率高、产品性能衡量标准清晰,先发企业就能打造出基于规模的竞争优势。那么,这对行业后来者意味着什么?从上述三个问题中,能梳理出行业后来者在市场中开辟专属空间的两种方法,潜在的竞争者不必二选一,也可以双管齐下。
Identify and secure alternative data sources. In some markets for prediction tools, there may be reservoirs of potential training data that incumbents have not already captured. Going back to the example of radiology, tens of thousands of doctors are each reading thousands of scans a year, meaning that hundreds of millions (or even billions) of new data points are available.
挖掘并掌握替代性数据源。在部分预测工具市场中,可能还存在行业头部企业尚未挖掘的潜在训练数据资源。再以放射学领域为例,全球数万名医生每年都会解读数千张医学影像,这意味着该领域每年会产生数亿甚至数十亿个新的数据点。
Early entrants will have training data from a few hundred radiologists. Of course, once their software is running in the field, the number of scans and the amount of feedback in their database will increase substantially, but the billions of scans previously analyzed and verified represent an
先发企业的训练数据可能仅来自数百名放射科医生。当然,一旦其软件投入实际应用,数据库中的医学影像数量和反馈数据量会大幅增加,但此前全球医生已分析和验证过的数十亿张医学影像,为行业后来者提供了赶超的机会,
opportunity for laggards to catch up, assuming they are able to pool the scans and analyze them in the aggregate. If that's the case, they might be able to develop an AI that makes good-enough predictions to go to market, after which they too can benefit from feedback.
只要后来者能整合这些影像数据并进行综合分析。如果能做到这一点,后来者就能研发出符合商业应用标准的人工智能模型,入局市场后,也能借助反馈数据实现模型的持续优化。
Latecomers could also consider training an AI using pathology or autopsy data rather than human diagnoses. That strategy would enable them to reach the quality threshold sooner (because biopsies and autopsies are more definitive than body scans), though the subsequent feedback loop would be slower.
行业后来者还可以考虑用病理诊断或尸检数据,而非人类医生的诊断数据来训练人工智能模型。这一策略能让模型更快达到商业应用的质量标准(因为活检和尸检的诊断结果,比医学影像的诊断结果更精准),尽管后续的反馈循环效率会更低。
Alternatively, instead of trying to find untapped sources of training data, latecomers could look for new sources of feedback data that enable faster learning than what incumbents are using. (BenchSci is an example of a company that has succeeded in doing this.) By being first with a novel supply of faster feedback data, the newcomer can then learn from the actions and choices of its users to make its product better. But in markets where feedback loops are already fairly rapid and where incumbents are operating at scale, the opportunities for pulling off this approach will be relatively limited. And significantly faster feedback would likely trigger a disruption of current practices, meaning that the new entrants would not really be competing with established companies but instead displacing them.
此外,后来者也可以放弃挖掘未被开发的训练数据,转而寻找新的反馈数据来源,让模型的学习速度超过行业头部企业。(BenchSci 就是成功做到这一点的企业案例。)如果能率先掌握更高效的反馈数据来源,后来者就能从用户的行为和选择中学习,持续优化产品。但在反馈循环本就高效、且头部企业已形成规模优势的市场中,这一方法的实施空间相对有限。而如果新的反馈数据能带来效率的大幅提升,往往会颠覆行业现有的运营模式,这意味着行业新进入者并非与头部企业展开竞争,而是会取代它们的市场地位。
Differentiate the prediction. Another tactic that can help late entrants become competitive is to redefine what makes a prediction "better," even if only for some customers. In radiology, for example, such a strategy could be possible if there is market demand for different types of predictions. Early entrants most likely trained their algorithms with data from one hospital system, one type of hardware, or one country. By using training data (and then feedback data) from another system or another country, the newcomer could customize its AI for that user segment if it is sufficiently distinct. If, say, urban Americans and people in rural China tend to experience different health conditions, then a prediction machine built to diagnose one of those groups might not be as accurate for diagnosing patients in the other group.
实现预测的差异化。能帮助行业后来者提升竞争力的另一策略,是重新定义 "优质" 预测的标准,即便这一标准仅适用于部分客户。例如,在放射学领域,如果市场对不同类型的预测存在需求,这一策略就具备实施的可能性。先发企业的算法训练数据,很可能来自某一家医院体系、某一类医疗设备,或是某一个国家。如果行业后来者能利用其他医院体系或其他国家的训练数据(以及后续的反馈数据),为特征鲜明的特定用户群体定制人工智能模型,就能实现差异化竞争。例如,美国城市人群和中国农村人群的常见疾病存在差异,为其中一个群体研发的诊断预测模型,对另一个群体的诊断准确率可能会大打折扣。
Creating predictions that rely on data coming from a particular type of hardware could also provide a market opportunity, if that business model results in lower costs or increases accessibility for customers. Many of today's AIs for radiology draw upon data from the most widely used X-ray machines, scanners, and ultrasound devices made by GE, Siemens, and other established manufacturers. However, if the algorithms are applied to data from other machines, the resulting predictions may be less accurate. Thus a late entrant could find a niche by offering a product tailored to that other equipment-which might be attractive for medical facilities to use if it is cheaper to purchase or operate or is specialized to meet the needs of particular customers.
如果依托特定类型设备产生的数据打造预测模型,能为客户降低成本或提升产品的可获得性,那么这一模式也能为企业带来市场机会。如今,多数放射学人工智能模型的训练数据,都来自通用电气、西门子等老牌制造商生产的主流 X 光机、扫描仪和超声设备。但如果将这些算法应用于其他品牌设备产生的数据,预测准确率会有所下降。因此,行业后来者可以针对这些非主流设备打造定制化的人工智能产品,深耕这一细分领域。如果这些非主流设备的采购和运营成本更低,或是能满足特定客户的需求,那么配套的人工智能产品也会吸引各类医疗机构购买。
THE POTENTIAL OF prediction machines is immense, and there is no doubt that the tech giants have a head start. But it's worth remembering that predictions are like precisely engineered products, highly adapted for specific purposes and contexts. If you can differentiate the purposes and contexts even a little, you can create a defensible space for your own product. Although the devil is in the details of how you collect and use data, your salvation rests there as well.
预测模型的潜力是巨大的,科技巨头无疑占据着先发优势。但值得注意的是,预测模型就像精密设计的产品,需要高度适配特定的用途和场景。只要你能在用途和场景上做出些许差异化,就能为自己的产品打造出难以被撼动的市场空间。数据的收集和使用方式决定了模型的成败,同时,这也是行业后来者实现赶超的关键突破口。
Nonetheless, the real key to competing successfully with Big Tech in industries powered by intelligent machines lies in a question that only a human can answer: What is it that you want to predict? Of course, figuring out the answer is not easy. Doing so necessitates a deep understanding of market dynamics and thoughtful analysis of the potential worth of specific predictions and the products and services in which they are embedded. It is therefore perhaps not surprising that the lead investor in BenchSci's Series A2 financing was not one of the many local Canadian tech investors but rather an AI-focused venture capital firm called Gradient Ventures- owned by Google.
尽管如此,在由智能机器驱动的各行业中,要想与科技巨头展开有效竞争,真正的关键在于一个只有人类才能回答的问题:你想要预测什么?当然,找到答案并非易事。这需要企业深入理解市场动态,同时审慎分析特定预测结果的潜在价值,以及这些预测所依托的产品和服务的市场潜力。因此,BenchSci 的 A2 轮融资领投方并非加拿大本土的科技投资机构,而是谷歌旗下专注于人工智能领域的风投公司 Gradient Ventures,这一结果也就不足为奇了。
AJAY AGRAWAL is the Geoffrey Taber Chair in Entrepreneurship and Innovation at the University of Toronto's Rotman School of Management and the founder of the Creative Destruction Lab. JOSHUA GANS is the Jeffrey S. Skoll Chair in Technical Innovation and Entrepreneurship at Rotman and the chief economist at the Creative Destruction Lab. AVI GOLDFARB is the Rotman Chair in Artificial Intelligence and Healthcare at Rotman and the chief data scientist at the Creative Destruction Lab. Together they authored Prediction Machines: The Simple Economics of Artificial Intelligence (Harvard Business Review Press, 2018).
阿杰伊・阿格拉沃尔:多伦多大学罗特曼管理学院杰弗里・泰伯创业与创新讲席教授,创意破坏实验室创始人。
乔舒亚・甘斯:多伦多大学罗特曼管理学院杰弗里・S・斯科尔技术创新与创业讲席教授,创意破坏实验室首席经济学家。
阿维・戈德法布:多伦多大学罗特曼管理学院人工智能与医疗健康讲席教授,创意破坏实验室首席数据科学家。
三人合著《预测机器:人工智能的简单经济学》(哈佛商业评论出版社,2018 年)。
Copyright 2020 Harvard Business Publishing. All Rights Reserved. Additional restrictions may apply including the use of this content as assigned course material. Please consult your institution's librarian about any restrictions that might apply under the license with your institution. For more information and teaching resources from Harvard Business Publishing including Harvard Business School Cases, eLearning products, and business simulations please visit hbsp.harvard.edu.
2020 年哈佛商业出版公司 版权所有。保留所有权利。本内容的使用可能受到其他限制,包括作为指定课程材料的使用。请咨询你所在机构的图书管理员,了解机构授权协议下的相关限制。如需了解哈佛商业出版公司的更多信息和教学资源,包括哈佛商学院案例、线上学习产品和商业模拟课程,请访问 hbsp.harvard.edu。
Reference
- How to Win with Machine Learning - 2020
https://hbr.org/2020/09/how-to-win-with-machine-learning.