1. NoSQL和Redis基础
1.1 NoSQL简介
1.1.1 数据库的发展
- 单机器MySQL时代
APP DAL MySQL
90年代,一个基本的网站访问量一般不会太大,单个数据库完全足够。
那个时候,我们更多的去使用静态页面html,服务器压力不会太大
网站的瓶颈在于:
- 数据量如果太大,一个机器放不下
- 大量的数据索引问题(B+ Tree结构),一个机器的内存也放不下
- 一个服务器承受不了太大的访问量(读写混合)
- Memcached(缓存)+MySQL+垂直拆分(读写分离)
APP DAL Cache MySQL1 MySQL2 MySQL3
网站有百分之八十的情况都是在读,每次都要去查询数据库的话就十分的麻烦,所以我们希望减轻数据的压力,我们可以使用缓存来保证效率。
发展过程:优化数据结构和索引-->文件缓存(IO)-->memcached
缓存解决的是读的问题
- 分库分表+水平拆分(MySQL集群)
分库分表解决的是写入的问题。
根据业务的种类来进行划分
- 数据爆炸的年代
万物皆数据:定位,音乐,视频,图片等
传统的关系型数据库就不够用了,因为数据很多,而且变化很快
当MySQL用来存储一些比较大文件,如博客,图片时,数据库表很大,效率就降低了
如果有专用的数据库来处理这种数据,MySQL的压力就会变得十分小。但是在大数据的IO压力下,表几乎没办法更改
NoSQL就是用来解决这些痛点的
1.1.2 什么是NoSQL
Not only SQL,泛指非关系型数据局
关系型数据库:表,行,列
随着web2.0互联网的诞生,传统的关系型数据库很难对付web2.0时代,尤其是超大规模的高并发社区, 暴露出很多难以克服的问题,NoSQL在当今大数据环境下发展十分迅速,而Redis是其中发展最快的。
很多数据类型如用户的个人信息,社交网络,地理位置,这些数据类型的存储不需要一个固定的格式,不需要多余的操作就能横向扩展
NoSQL的特点
- 方便扩展,因为数据之间没有关系,不存在耦合性,不会产生架构层面的问题。
- 大数据量,高性能。redis一秒钟写8万次,读取11万次。NoSQL的缓存是记录级的,是一种细粒度的缓存,性能较高。
- 数据类型具有多样性。不需要事先设计数据库的表,随取随用。
NoSQL和传统数据库对比
- 传统的关系型数据库:结构化组织,结构化查询语言SQL,数据和关系都存在单独的表中,操作语言,数据定义语言,严格的一致性,基础的事务操作
- NoSQL:不仅仅是数据,没有固定的查询语言,键值对存储、列存储、文档存储、图形数据库(如社交关系),最终一致性,CAP定理和base理论(异地多活),高性能、高可用、高可扩展
大数据时代的3V:主要是描述问题的
- 海量Volume
- 多样Variety
- 实施Velocity(4G)
大数据时代的3高:主要是对程序的需求
- 高并发
- 高可扩展(随时可以水平拆分)
- 高性能(保证用户体验)
1.1.2 NoSQL分类
KV键值对
可以通过key快速查询到其value。一般来说,存储不管value的格式,照单全收。(Redis包含了其他功能)
- redis
- Tair
文档型数据库
文档存储一般用类似json的格式存储,存储的内容是文档型的。这样也就有有机会对某些字段建立索引,实现关系数据库的某些功能。
- mongoDB:分布式文档存储数据库,使用C++编写,主要用来处理大量的的文档。mongoDB是一个介于关系型数据库和非关系型数据库的中间产品,是非关系型数据库中功能最丰富,最像关系型数据库的
- ConthDB
列存储数据库
按列存储数据的。最大的特点是方便存储结构化和半结构化数据,方便做数据压缩,对针对某一列或者某几列的查询有非常大的IO优势。
- HBase
- 分布式文件系统
图关系数据库
图形关系的最佳存储。使用传统关系数据库来解决的话性能低下,而且设计使用不方便。
放的是关系,而不是图片,如朋友圈,广告推荐
- Neo4j
- InfoGrid
分类 | 举例 | 典型应用场景 | 数据模型 | 优点 | 缺点 |
---|---|---|---|---|---|
键值 | RedisMemcachedOracle BDB | 内容缓存,主要用于处理大量数据的高访问负载,也用于一些日志系统等 | key指向value的键值对,通常用hash table来实现 | 查找速度快 | 数据无结构化,通常只被当做字符串或者二进制数据 |
列存储 | CassandraHBaseRiak | 分布式的文件系统 | 以列簇式存储,将泳衣列数据存在一起 | 查找速度快,可扩展性强,更容易进行分布式扩展 | 功能相对局限 |
文档型 | CouchDBMongoDB | Web应用(与键值对类似,value是结构化的,不同的是数据库能够了解value的内容) | Key-Value对应的键值对,Value为结构化数据 | 数据结构要求不严格,表结构可变,不需要像关系型数据库一样需要预先定义表结构 | 查询性能不高,而且缺乏统一的查询语法 |
图形数据库 | Neo4JInfoGridInfinite Graph | 社交网络,推荐系统等。专注构建关系图谱 | 图结构 | 利用图结构相关算法。比如最短路径寻址,N度关系查找等 | 很多时候需要对整个图做计算才能得出需要的信息,而且这种结构不太好做分布式的集群方案。 |
1.2 Redis简介
Redis-Remote Dictionary Server
是一个开源的使用ANSI C语言编写、支持网络、可给予内存也可以持久化的日志型、Key-Value数据库,并提供多种语言的API。也被称之为结构化数据库
Redis会周期性的把更新的数据写入磁盘或者把修改操作写入追加的记录文件,并且在此基础上实现了master-slave同步
功能和作用
- 内存存储和持久化,内存存储保证效率,但是断电即失
- 效率高,可用于高速缓存
- 发布订阅系统
- 地图信息分析
- 计时器、计数器
特性
- 多样的数据类型
- 持久化
- 集群
- 事务
Redis与其他key-value存储有什么不同?
Redis有着更为复杂的数据结构并且提供对他们的原子性操作,这是一个不同于其他数据库的进化路径。Redis的数据类型都是基于基本数据结构的同时对程序员透明,无需进行额外的抽象。
Redis运行在内存中但是可以持久化到磁盘,所以在对不同数据集进行高速读写时需要权衡内存,因为数据量不能大于硬件内存。在内存数据库方面的另一个优点是,相比在磁盘上相同的复杂的数据结构,在内存中操作起来非常简单,这样Redis可以做很多内部复杂性很强的事情。同时,在磁盘格式方面他们是紧凑的以追加的方式产生的,因为他们并不需要进行随机访问。
Redis是单线程的,因为redis是基于内存操作的,瓶颈是内存和网络带宽
1.3 Redis安装
- 下载软件
下载安装包:http://redis.io
-
安装软件
[root@host1 tmp]# tar -xcf redis-6.2.2.tar.gz
[root@host1 tmp]# cd redis-6.2.2/
[root@host1 redis-6.2.2]# make && make install
[root@host1 redis-6.2.2]# cd /usr/local/bin/
[root@host1 bin]# ls
redis-benchmark redis-check-aof redis-check-rdb redis-cli redis-sentinel redis-server
[root@host1 usr]# mkdir ./etc/redis6
[root@host1 usr]# cp /tmp/redis-6.2.2/redis.conf ./etc/redis6/ -
修改配置
[root@host1 usr]# cd ./etc/redis6/
[root@host1 redis6]# ls
redis.conf
[root@host1 redis6]# vim redis.confdaemonize yes # 改为后台运行模式
-
启动服务
[root@host1 ~]# /usr/local/bin/redis-server /usr/etc/redis6/redis.conf
-
连接服务器
[root@host1 ~]# /usr/local/bin/redis-cli -p 6379
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> set name bruce
OK
127.0.0.1:6379> get name
"bruce"
127.0.0.1:6379> keys *- "name"
127.0.0.1:6379>
- "name"
-
停止服务
127.0.0.1:6379> shutdown
not connected> exit
性能测试
Redis性能测试是通过同时执行多个命令实现的。
选项
序号 | 选项 | 描述 | 默认值 |
---|---|---|---|
1 | -h | 指定服务器主机名 | 127.0.0.1 |
2 | -p | 指定服务器端口 | 6379 |
3 | -s | 指定服务器 socket | |
4 | -c | 指定并发连接数 | 50 |
5 | -n | 指定请求数 | 10000 |
6 | -d | 以字节的形式指定 SET/GET 值的数据大小 | 3 |
7 | -k | 1=keep alive 0=reconnect | 1 |
8 | -r | SET/GET/INCR 使用随机 key, SADD 使用随机值 | |
9 | -P | 通过管道传输 <numreq> 请求 | 1 |
10 | -q | 强制退出 redis。仅显示 query/sec 值 | |
11 | --csv | 以CSV格式输出 | |
12 | -l(L 的小写字母) | 生成循环,永久执行测试 | |
13 | -t | 仅运行以逗号分隔的测试命令列表,如set,get | |
14 | -I(i 的大写字母) | Idle 模式。仅打开 N 个 idle 连接并等待。 |
[root@host1 ~]# /usr/local/bin/redis-benchmark -h localhost -p 6379 -c 100 -n 10000 # 省略了大量过程数据
====== PING_INLINE ======
10000 requests completed in 0.10 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 98039.22 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.748 0.168 0.703 1.407 2.143 2.767
====== PING_MBULK ======
10000 requests completed in 0.07 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 147058.81 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.436 0.080 0.335 0.903 1.423 1.847
====== SET ====== # set请求,也就是写入数据,一万个请求用时0.1秒
10000 requests completed in 0.10 seconds
100 parallel clients # 100个并发客户端
3 bytes payload # 每个请求写入3个字节
keep alive: 1 # 一个服务器端,说明是单机测试
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 103092.78 requests per second # 每秒钟可以处理10万多个请求
latency summary (msec):
avg min p50 p95 p99 max
0.692 0.144 0.695 1.007 1.199 2.615
====== GET ======
10000 requests completed in 0.09 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 117647.05 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.591 0.152 0.551 1.055 1.887 3.023
====== INCR ======
10000 requests completed in 0.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 1.919 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
Summary:
throughput summary: 129870.13 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.495 0.136 0.407 0.935 1.439 1.919
====== LPUSH ======
10000 requests completed in 0.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 120481.93 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.579 0.072 0.583 0.967 1.135 1.527
====== RPUSH ======
10000 requests completed in 0.07 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 136986.30 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.489 0.072 0.303 1.127 1.503 1.975
====== LPOP ======
10000 requests completed in 0.07 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 136986.30 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.503 0.064 0.303 1.199 1.663 2.255
====== RPOP ======
10000 requests completed in 0.09 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 2.255 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
100.000% <= 3.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 111111.11 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.654 0.072 0.631 1.231 1.711 2.255
====== SADD ======
10000 requests completed in 0.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 1.535 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
100.000% <= 1.607 milliseconds (cumulative count 10000)
Summary:
throughput summary: 128205.12 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.496 0.144 0.367 0.991 1.223 1.535
====== HSET ======
10000 requests completed in 0.06 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 2.399 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
100.000% <= 3.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 166666.67 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.356 0.064 0.319 0.527 1.239 2.399
====== SPOP ======
10000 requests completed in 0.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 9.535 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
100.000% <= 10.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 126582.27 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.543 0.112 0.327 1.095 2.423 9.535
====== ZADD ======
10000 requests completed in 0.06 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
Summary:
throughput summary: 169491.53 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.353 0.088 0.311 0.575 0.983 1.327
====== ZPOPMIN ======
10000 requests completed in 0.07 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 2.335 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
99.010% <= 1.703 milliseconds (cumulative count 9901)
99.400% <= 1.807 milliseconds (cumulative count 9940)
99.460% <= 1.903 milliseconds (cumulative count 9946)
99.520% <= 2.007 milliseconds (cumulative count 9952)
99.720% <= 2.103 milliseconds (cumulative count 9972)
100.000% <= 3.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 149253.73 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.455 0.088 0.311 1.159 1.703 2.335
====== LPUSH (needed to benchmark LRANGE) ======
10000 requests completed in 0.07 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 1.655 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
99.976% <= 9.231 milliseconds (cumulative count 9998)
99.988% <= 9.271 milliseconds (cumulative count 9999)
99.994% <= 9.375 milliseconds (cumulative count 10000)
100.000% <= 9.375 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
99.550% <= 7.103 milliseconds (cumulative count 9955)
99.850% <= 8.103 milliseconds (cumulative count 9985)
99.960% <= 9.103 milliseconds (cumulative count 9996)
100.000% <= 10.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 23809.53 requests per second
latency summary (msec):
avg min p50 p95 p99 max
2.390 0.512 2.159 4.279 6.463 9.375
====== LRANGE_600 (first 600 elements) ======
10000 requests completed in 0.51 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
100.000% <= 8.983 milliseconds (cumulative count 10000)
Cumulative distribution of latencies:
100.000% <= 9.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 19569.47 requests per second
latency summary (msec):
avg min p50 p95 p99 max
2.774 0.976 2.679 3.551 6.031 8.983
====== MSET (10 keys) ======
10000 requests completed in 0.08 seconds
100 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
Cumulative distribution of latencies:
99.880% <= 2.103 milliseconds (cumulative count 9988)
100.000% <= 3.103 milliseconds (cumulative count 10000)
Summary:
throughput summary: 121951.22 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.648 0.160 0.583 1.199 1.815 2.183
指定某个请求的测试
[root@host1 ~]# /usr/local/bin/redis-benchmark -t set
====== SET ======
100000 requests completed in 0.76 seconds
50 parallel clients
3 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: no
Latency by percentile distribution:
0.000% <= 0.063 milliseconds (cumulative count 1)
50.000% <= 0.183 milliseconds (cumulative count 50404)
75.000% <= 0.343 milliseconds (cumulative count 75091)
87.500% <= 0.423 milliseconds (cumulative count 87543)
93.750% <= 0.495 milliseconds (cumulative count 93996)
96.875% <= 0.591 milliseconds (cumulative count 96965)
98.438% <= 0.719 milliseconds (cumulative count 98475)
99.219% <= 0.967 milliseconds (cumulative count 99229)
99.609% <= 1.287 milliseconds (cumulative count 99616)
99.805% <= 1.687 milliseconds (cumulative count 99806)
99.902% <= 2.119 milliseconds (cumulative count 99904)
99.951% <= 2.279 milliseconds (cumulative count 99955)
99.976% <= 2.535 milliseconds (cumulative count 99976)
99.988% <= 2.599 milliseconds (cumulative count 99989)
99.994% <= 2.631 milliseconds (cumulative count 99994)
99.997% <= 2.647 milliseconds (cumulative count 99997)
99.998% <= 2.663 milliseconds (cumulative count 99999)
99.999% <= 2.671 milliseconds (cumulative count 100000)
100.000% <= 2.671 milliseconds (cumulative count 100000)
Cumulative distribution of latencies:
0.537% <= 0.103 milliseconds (cumulative count 537)
53.763% <= 0.207 milliseconds (cumulative count 53763)
69.045% <= 0.303 milliseconds (cumulative count 69045)
85.098% <= 0.407 milliseconds (cumulative count 85098)
94.266% <= 0.503 milliseconds (cumulative count 94266)
97.253% <= 0.607 milliseconds (cumulative count 97253)
98.367% <= 0.703 milliseconds (cumulative count 98367)
98.845% <= 0.807 milliseconds (cumulative count 98845)
99.079% <= 0.903 milliseconds (cumulative count 99079)
99.313% <= 1.007 milliseconds (cumulative count 99313)
99.445% <= 1.103 milliseconds (cumulative count 99445)
99.552% <= 1.207 milliseconds (cumulative count 99552)
99.626% <= 1.303 milliseconds (cumulative count 99626)
99.686% <= 1.407 milliseconds (cumulative count 99686)
99.741% <= 1.503 milliseconds (cumulative count 99741)
99.786% <= 1.607 milliseconds (cumulative count 99786)
99.812% <= 1.703 milliseconds (cumulative count 99812)
99.838% <= 1.807 milliseconds (cumulative count 99838)
99.864% <= 1.903 milliseconds (cumulative count 99864)
99.889% <= 2.007 milliseconds (cumulative count 99889)
99.901% <= 2.103 milliseconds (cumulative count 99901)
100.000% <= 3.103 milliseconds (cumulative count 100000)
Summary:
throughput summary: 132100.39 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.257 0.056 0.183 0.527 0.879 2.671
基本特性
-
默认总共有16个数据库,使用的是第0个数据库,可以使用select命令进行切换
[root@host1 ~]# /usr/local/bin/redis-cli -p 6379
127.0.0.1:6379> select 3
OK
127.0.0.1:6379[3]> -
查看数据库的大小
127.0.0.1:6379[3]> set name bruce
OK
127.0.0.1:6379[3]> dbsize
(integer) 1
127.0.0.1:6379[3]> select 1
OK
127.0.0.1:6379[1]> dbsize
(integer) 0 -
查看数据库内容
127.0.0.1:6379[3]> get name
"bruce"
127.0.0.1:6379[3]> keys *- "name"
-
清空数据库
127.0.0.1:6379[3]> flushdb # 清空本数据库
OK
127.0.0.1:6379[3]> keys *
(empty array)
127.0.0.1:6379> flushall # 清空全部数据库内容,慎用
OK