
文章目录
Action行动算子countByKey和countByValue使用案例
Action行动算子countByKey和countByValue使用案例
一、countByKey使用案例
作用到K,V格式的RDD上,根据Key计数相同Key出现的次数,结果会回收到Driver端。
Java代码:
java
SparkConf conf = new SparkConf().setMaster("local").setAppName("CountByKeyTest");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<String, Integer> rdd = sc.parallelizePairs(Arrays.asList(
new Tuple2<>("a", 1),
new Tuple2<>("b", 2),
new Tuple2<>("c", 3),
new Tuple2<>("a", 4),
new Tuple2<>("b", 5),
new Tuple2<>("a", 6),
new Tuple2<>("c", 7)
));
//countByKey:统计每种key的个数
Map<String, Long> map = rdd.countByKey();
map.forEach((k,v)-> System.out.println(k+":"+v));
sc.stop();
Scala代码:
Scala
val conf: SparkConf = new SparkConf().setMaster("local").setAppName("CountByKeyTest")
val sc = new SparkContext(conf)
val rdd: RDD[(String, Int)] = sc.parallelize(List(
("a", 1),
("b", 2),
("c", 3),
("a", 4),
("b", 5),
("a", 6),
("c", 7)
))
val result: collection.Map[String, Long] = rdd.countByKey()
result.foreach(println)
sc.stop()
二、countByValue使用案例
根据RDD数据集每个元素相同的内容来计数,返回相同元素对应的条数,作用到KV或者非KV格式RDD上都可以,结果也会回收到Driver端。
Java代码:
java
SparkConf conf = new SparkConf().setMaster("local").setAppName("CountByValueTest");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> rdd = sc.parallelize(Arrays.asList("a", "b", "c", "a", "b", "c", "a", "b", "c"));
//countByValue:统计每种value的个数
Map<String, Long> map = rdd.countByValue();
map.forEach((k,v)-> System.out.println(k+":"+v));
sc.stop();
Scala代码:
Scala
val conf: SparkConf = new SparkConf().setMaster("local").setAppName("CountByValueTest")
val sc = new SparkContext(conf)
val rdd: RDD[String] = sc.parallelize(List("a", "b", "c", "a", "b", "a", "c"))
val map: collection.Map[String, Long] = rdd.countByValue()
map.foreach(println)
sc.stop()
- 📢博客主页:https://lansonli.blog.csdn.net
- 📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正!
- 📢本文由 Lansonli 原创,首发于 CSDN博客🙉
- 📢停下休息的时候不要忘了别人还在奔跑,希望大家抓紧时间学习,全力奔赴更美好的生活✨