Chapter02 Setting Up the Document Store with MongoDB
02 The structure of a MongoDB database
FastAPI+React全栈开发07 MongoDB数据库的结构
MongoDB is arguably the most used NoSQL database today, its power, ease of use, and versatility make it an excellent choice for large and small projects; its scalability and performance enable use to be certain that at least the data layer of our app has a very solid foundation.
MongoDB可以说是目前使用最多的NoSQL数据库,它的功能强大、易于使用和多功能性使其成为大型和小型项目的绝佳选择;它的可扩展性和性能让用户确信,至少我们的应用程序的数据层有一个非常坚实的基础。
In the following sections, we will take a deeper dive into the basic units of MongoDB: the document, the coolection, and the database. Since this book is taking a bottom-up approach, we would like to start from the very bottom and present and overview of the simplest data structures available in MongoDB and then take it up from there into documents, collections, and so on.
在下面几节中,我们将深入探讨MongoDB的基本单元:文档、集合和数据库。由于本书采用自下而上的方法,我们想从最底部开始,介绍和概述MongoDB中可用的最简单的数据结构,然后从那里开始到文档,集合等。
Documents
We have repreated numberous times that MongoDB is a document-oriented database, so let's take a look at what that actually means. If you are familiar with relational database tables(with columns and rows), you know that one unit of information is contained in a row, and we might say that the columns describe that data.
我们已经多次提到MongoDB是一个面向文档的数据库,所以让我们来看看这到底是什么意思。如果您熟悉关系数据库表(包括列和行),就会知道一行中包含一个信息单元,我们可能会说列描述了该数据。
In MongoDB, we can make a rough analogy with the relational database row, but, since we do not have to adhere to a fixed set of columns, the model is much more flexible. In fact, it is a s flexible as you want it to be, but you might not want to take things too far in that direction if you want to achieve some real functionality. This flexible document really is just an ordered set of keys and corresponding values. This structure , as we will expore later, corresponds with data structures in every programming language, in Python, we will see that this ctructure is a dictionary and lends itself perfectly to the flow of data of a web app or a desktop application.
在MongoDB中,我们可以与关系数据库行做一个粗略的类比,但是,由于我们不必坚持一组固定的列,因此该模型要灵活得多。事实上,它可以像您希望的那样灵活,但是如果您想要实现一些真正的功能,您可能不希望在这个方向上走得太远。这个灵活的文档实际上只是一组有序的键和相应的值。这个结构,正如我们稍后将探讨的,对应于每一种编程语言的数据结构,在Python中,我们将看到这个结构是一个字典,它非常适合于web应用程序或桌面应用程序的数据流。
The fules for creating documents are pretty simple: the key must be a string, a UTF-8 charactrer with a few exceptions, and the document annot contain multiple keys. We also have to keep in mind that MongoDB is case sensitive. Let's take a look at the following relatively simple valid MongoDB document, similar to the ones that we will be using throughout the chapter.
创建文档的功能非常简单:键必须是字符串,除了少数例外是UTF-8字符,并且文档不能包含多个键。我们还必须记住,MongoDB是区分大小写的。让我们看一下下面相对简单的有效MongoDB文档,类似于我们将在整个章节中使用的文档。
json
{
"_id": {
"$oid": "62231e0a286b06fd01be579e"
},
"brand": "Hyundai",
"make": "ix35",
"year": 2012,
"price": 9000,
"km": 143500
}
Apart from the first field, denoted by _id
, which is the unique ID of the document, all of the other fields correspond to simple JavaScript Object Notation(JSON) fields, brand and make are strings (Hyundai, i35), whereas year, price and km (denoting the year of production, the price of the vehicle in euros, and the numbers of kilometers on the meter) are numbers (integers, to be precise).
除了第一个字段,用' _id '表示,它是文档的唯一ID,其他所有字段都对应于简单的JavaScript Object Notation(JSON)字段,brand品牌和make制造是字符串(Hyundai, i35),而year年、price价格和km公里(表示生产年份、车辆的欧元价格和里程表上的公里数)是数字(准确地说,是整数)。
So, what data types can we use in our documents? One of the first important decisions when designing any type of application is the choice of data types, we really do not want to use the wrong tools for the job at hand. Let's look at the most important data types in the following sections.
那么,我们可以在文档中使用哪些数据类型呢?在设计任何类型的应用程序时,最重要的决策之一就是选择数据类型,我们确实不希望为手头的工作使用错误的工具。让我们在下面几节中看看最重要的数据类型。
Strings
Strings are probably the most basic and universal data type in MongoDB, and they are used to represent all text fields in a document. Bear in mind that text fields do not have to represent only strictly textual values, in our case, in the application that we will be building, most text fields will, infact, denote a categorical variable, such as the brand of the car or the fact that the car has a manual or automatic transmission. This fact will come in handy if you are designing a data science applicatoin that has categorical or ordinal variables. As in JSON, text fields are wrapped in quotes. JSON files follow a dictionary-like structure with a sring, numbers, arrays, and Booleans of key-value pairs. An example of a string vaiable called name encoded in JSON would be the following:
字符串可能是MongoDB中最基本和通用的数据类型,它们用于表示文档中的所有文本字段。请记住,文本字段不必只表示严格的文本值,在我们将要构建的应用程序中,大多数文本字段实际上表示一个分类变量,例如汽车的品牌或汽车是手动或自动变速器。如果您正在设计具有分类变量或顺序变量的数据科学应用程序,这个事实将派上用场。与JSON一样,文本字段用引号括起来。JSON文件遵循类似字典的结构,包含字符串、数字、数组和键值对的布尔值。一个名为name的字符串变量在JSON中编码的例子如下:
json
"name": "Marko"
Text fields can be indexed in order to speeed up searching and they are searchable with standard regular expressions, which makes them a powerful tool able to process even massive amounts of text.
文本字段可以被索引以加快搜索速度,并且可以使用标准正则表达式进行搜索,这使它们成为能够处理大量文本的强大工具。
Numbers
MongoDB supports different types of numbers:
- int: 32-bit signed integers
- decimal: 128-bit floating point
- long: 64-bit unsigned integer
- double: 64-bit floating point
MongoDB支持不同类型的数字:
- int: 32位带符号整数
- decimal: 128位浮点数
- long: 64位无符号整数
- double: 64位浮点数
Every MongoDB driver takes care of transforming data types according to the programming language that is used to interface, so we shouldn't worry about conversions except in particular cases that will not be covered here.
每个MongoDB驱动程序都根据用于接口的编程语言负责转换数据类型,所以我们不应该担心转换,除非在这里不涉及的特殊情况下。
Booleans
This is the standard Boolean true or false value; they are written without quotes since we do not want them to be interpreted as strings.
这是标准的布尔值true或false;它们没有引号,因为我们不希望它们被解释为字符串。
Objects or embedded documents
This is where the magic happens. Object fields in MongoDB represent neseted or embedded documents and their values are other valid JSON documents. These embedded documents can have other embedded documents inside, and this seemingly simple capability allows for complex data modeling. An example would be if we wanted to embed the salesman responsible for a particular car, added in bold in the following example.
这就是奇迹发生的地方。MongoDB中的对象字段表示嵌套或嵌入的文档,它们的值是其他有效的JSON文档。这些嵌入文档可以包含其他嵌入文档,并且这种看似简单的功能允许进行复杂的数据建模。例如,如果我们想嵌入负责特定汽车的销售人员,在下面的示例中以粗体添加。
json
{
"_id": {
"$oid": "62231e0a286b06fd01be579e"
},
"brand": "Hyundai",
"make": "ix35",
"year": 2012,
"price": 9000,
"km": 143500,
"salesman": {
"name": "Marko",
"_id": {
"$oid": "62231e0a286b87fd01be579e"
},
"active": true
}
}
Arrays
Arrays can contain zero or more values in a list-like structure. The elements of the array can be any MongoDB data type including other documents. They are zero-based and particularly suited for making embedded relationships, we could, for instance, store all of the post comments inside the blog post document itself, along with a timestamp and the user that made the comment. In our example, a document representing a car could contain a list of salesmen responsible for that vehicle, a list of customer requests for additional information regarding the car, and so on. Arrays can benefit from the standard JavaScript array methods for fast editing, pushing, and others.
数组可以在类似列表的结构中包含零个或多个值。数组的元素可以是任何MongoDB数据类型,包括其他文档。它们是从零开始的,特别适合于建立嵌入式关系,例如,我们可以将所有的帖子评论以及时间戳和发表评论的用户存储在博客帖子文档本身中。在我们的示例中,表示汽车的文档可能包含负责该车辆的销售人员列表、请求有关该汽车的附加信息的客户列表,等等。数组可以从标准的JavaScript数组方法中受益,用于快速编辑、推送等。
ObjectIds
Every document in MongoDB has a unique 12-byte ID that is used to identify it, even across different machines, and serves as a primary key. This field is autogenerated by MongoDB every time we insert a new document, but it can also be provided manually, something that we will not do. These ObjectIds are extensively used as keys for traditional relationships, for instance, every salesperson in our application could have a list of ObjectIds, each corresponding to a car that the person is trying to sell. ObjectIds are automatically indexed.
MongoDB中的每个文档都有一个唯一的12字节ID,用于识别它,即使在不同的机器上也是如此,并作为主键。这个字段是由MongoDB在每次插入新文档时自动生成的,但它也可以手动提供,我们不会这样做。这些objectid被广泛用作传统关系的关键字,例如,我们应用程序中的每个销售人员都可以有一个objectid列表,每个objectid对应于该人员试图销售的汽车。目标是自动索引的。
Dates
Though JSON does not support date types and stores them as plain strings, MongoDB's BSON format supports date types explicitly. They represent the 64-bit number of milliseconds since the Unix epoch (January 1, 1970). All dates are stored in UTC and have no time zone associated.
虽然JSON不支持日期类型并将其存储为普通字符串,但MongoDB的BSON格式显式地支持日期类型。它们表示自Unix纪元(1970年1月1日)以来的64位毫秒数。所有日期都以UTC格式存储,没有关联的时区。
Binary data
Binary data fields can store arbitrary binary data and are the only way to save non-UTF-8 strings to a database. These fields can be used in conjunction with MongoDB's GridFS filesystem to store images, for example. Although, there are better and more cost-effective solutions for that, as we will see.
二进制数据字段可以存储任意二进制数据,并且是将非utf -8字符串保存到数据库的唯一方法。例如,这些字段可以与MongoDB的GridFS文件系统一起使用来存储图像。不过,正如我们将看到的,有更好、更经济的解决方案。
Other data types worth mentioning are null, which can represent a null value or a nonexistent field, and we can store even JavaScript functions.
其他值得一提的数据类型是null,它可以表示空值或不存在的字段,我们甚至可以存储JavaScript函数。
When it comes to nesting documents within documents, MongoDB supports 100 levels of nesting, which is a limit you really shouldn't be testing in your designs, at least in the beginning.
当涉及到文档中的嵌套文档时,MongoDB支持100层嵌套,这是一个限制,您真的不应该在设计中进行测试,至少在开始时是这样。
Documents in MongoDB are the basic unit of data and as such, they should be modeled carefully when trying to use the database-specific nature to our advantage. Documents should be as self-contained as possible and MongoDB, in fact, encourages a good amount of data denormalization. As MongoDB was built with the purpose of providing developers with a flexible data structure that should be able to fit the processes of data flow in a web application as easily as possible, you should think in terms of objects and not tables, rows, and columns.
MongoDB中的文档是数据的基本单位,因此,在试图利用数据库特定的特性时,应该仔细地对它们进行建模。文档应该尽可能地自包含,而MongoDB实际上鼓励大量的数据非规范化。由于MongoDB的构建目的是为开发人员提供灵活的数据结构,使其能够尽可能轻松地适应web应用程序中的数据流过程,因此您应该从对象而不是表、行和列的角度进行思考。
If a certain page needs to perform several different queries in order to get all the data needed for the page and then perform some combine operation, your application is bound to slow down. On the other hand, if your page greedily returns a bunch of data in a single query and the code then needs to go over this result set in order to filter the data that is actually needed, memory consumption will likely rise, and this can lead to a potential problem and slow operations. So, like almost everywhere, there is a sweet spot, a locally optimal solution, if you will.
如果某个页面需要执行几个不同的查询以获取该页所需的所有数据,然后执行一些组合操作,那么应用程序必然会变慢。另一方面,如果您的页面在单个查询中急切地返回一堆数据,然后代码需要遍历该结果集以过滤实际需要的数据,则内存消耗可能会增加,这可能导致潜在的问题和缓慢的操作。所以,几乎所有地方都有一个最佳点,一个局部最优解,如果你愿意的话。
In this book, we will be using a simple example with automobiles for sale and the documents representing the unit (a car, really) are going to be rather straightforward.
在本书中,我们将使用一个出售汽车的简单示例,并且表示单位(实际上是汽车)的文档将相当直接。
We can think of a scenario where users can post comments or reviews on these cars and the SQL-ish way to do it would be to create a many to many relationship; a car can have multiple user comments and a user can leave comments or ratings on multiple cars. To retrieve all of the comments for a particular car, we would then have to perform a join by using that car's primary key, entering the relationship table, and finding all of the comment IDs. Finally, we would use these comment IDs to filter the comments from the table that stores all of the comments, find their IDs, authors, the actual comments, ratings, and so on.
我们可以设想这样一种场景,用户可以对这些汽车发表评论或评论,而用sql的方式来做到这一点的方法是创建一个多对多关系;一辆车可以有多个用户评论,用户可以对多辆车发表评论或评分。为了检索特定汽车的所有评论,我们必须使用该汽车的主键、输入关系表并查找所有评论id来执行连接。最后,我们将使用这些评论id从存储所有评论的表中过滤评论,找到它们的id、作者、实际评论、评级等等。
In MongoDB, we can simply store the comments in an array of BSON objects embedded in the car document. As the user clicks on a particular car page, MongoDB performs one single find query and fetches the car data and all of the associated comments, ready to be displayed. Of course, if we want to make a user profile page and display all of the data and the comments and reviews made by the user, we wouldn't want to have to scan through all of the cars in the database and check if there are commens. In this case, it would probably be wise to have a separate collection that would store only users, their profiles, and the comments (storage is cheap!). Data modeling is the process of defining how our data will be stored and what relationships and types of relationships should exist between different documents in our data.
在MongoDB中,我们可以简单地将注释存储在嵌入到汽车文档中的BSON对象数组中。当用户单击特定的汽车页面时,MongoDB执行一个查找查询并获取汽车数据和所有相关的评论,准备显示。当然,如果我们想要创建一个用户配置文件页面,并显示用户的所有数据、评论和评论,我们就不希望扫描数据库中的所有汽车并检查是否有评论。在这种情况下,最好使用单独的集合,只存储用户、他们的配置文件和评论(存储成本很低!)数据建模是定义如何存储数据以及数据中不同文档之间应该存在什么关系和关系类型的过程。
Now that we have an idea of what type of fields are available in MongoDB and how we might want to map our business logic to a (flexible) shema, it is time to introduce collections, groups of documents and a counterpart to a table in the SQL world.
既然我们已经了解了MongoDB中可用的字段类型,以及我们可能希望如何将业务逻辑映射到(灵活的)模式,那么现在是时候介绍SQL世界中的集合、文档组和表的对应项了。
Collections and databases
With the notion of the schema flexibility already repeated several times, you might be asking yourself if multiple collections are even necessary? Indeed, if we can store any kind of heterogeneous documents in a single collection (and MongoDB says we can), why bother with separate collections? There are sveral reasons as follows.
模式灵活性的概念已经重复了几次,您可能会问自己是否需要多个集合?事实上,如果我们可以在单个集合中存储任何类型的异构文档(MongoDB说我们可以),为什么要使用单独的集合呢?有以下几个原因。
Different kinds (structures) of documents in a single collection make development very difficult. We could add fields denoting different kinds of documents, but this just brings overhead and performance issues. Besides, every application, whether web-based or not, needs to have some structure.
单个集合中不同类型(结构)的文档使得开发非常困难。我们可以添加表示不同类型文档的字段,但这只会带来开销和性能问题。此外,每个应用程序,无论是否基于web,都需要有一些结构。
It is much faster (by orders of magnitude) than querying for the document type.
它比查询文档类型要快得多(按数量级计算)。
Data locality: Grouping documents of the same type in a collection will require less disk seek time, and considering that indexing is defined by collection, the quering is much more efficient.
数据局部性:在集合中分组相同类型的文档需要更少的磁盘寻道时间,并且考虑到索引是由集合定义的,查询效率要高得多。
Although a single instance of MongoDB can host several databases at once, it is considered good practice to keep all of the document collections used in an application inside a single database. Whwen we install MongoDB, there will be three databases created and their names cannot be used for our application database: admin, local, and config. They are built-in databases that shouldn't be replaced, so avoid accidentally naming your database the same way.
尽管MongoDB的单个实例可以同时托管多个数据库,但将应用程序中使用的所有文档集合保存在单个数据库中被认为是一种良好的做法。当我们安装MongoDB时,将创建三个数据库,它们的名称不能用于我们的应用程序数据库:admin、local和config。它们是内置的数据库,不应该被替换,因此要避免意外地以相同的方式命名数据库。
After reviewing the basic fields, units, and structures that we are able to use in MongoDB, it is time to learn how to set up a MongoDB database server on our computer and how to create an online account on MongoDB.com. The local setup is excellent for quick prototyping that doesn't even require an internet connection(though in 2022 that shouldn't be a problem) and the online database-as-a-service Atlas provides several benefits.
在回顾了我们能够在MongoDB中使用的基本字段、单元和结构之后,是时候学习如何在我们的计算机上设置MongoDB数据库服务器以及如何在MongoDB.com上创建在线帐户。本地设置非常适合快速制作原型,甚至不需要互联网连接(尽管在2022年这应该不是问题),在线数据库即服务Atlas提供了几个好处。
First, it is easy to set up, and, as we will see, you can get up and running literally in minutes with a generous free tier database ready for work. Atlas takes away much of the manual setup and guarantees availability. Other benefits include the involvement of the MongoDB team (which tries to implement best practices), high security by default with access control, firewalls and granular access control, automated backups (depending on the tier), and the possibility to be productive right away.
首先,它很容易设置,而且,正如我们将看到的,您可以在几分钟内安装并运行一个可供工作的免费层数据库。Atlas省去了很多手动设置,并保证了可用性。其他好处包括MongoDB团队的参与(试图实现最佳实践),默认的高安全性访问控制,防火墙和粒度访问控制,自动备份(取决于层),以及立即生产的可能性。