[Java性能优化]_[容器创建枚举去重的最优方法]

场景

在开发Java程序时, 经常会遇到使用集合存储数据的情况。比如集合元素去重，集合添加大量的元素，枚举集合元素。这些常用的操作是有特定的高效率写法的，能节省时间和内存。

说明

如果明确知道元素不会超过某个数量，那么在创建集合的时候传入数值参数初始化集合的容量。这样操作能避免集合动态添加元素时频繁扩容导致时间和内存损耗。

java 复制代码

var first = new ArrayList<String>(10000);

枚举Map元素的时候使用，如果需要使用到Key和Value，那么使用Map.entrySet()方法直接返回Set的Key,Value集合, 会比先返回keySet()集合再通过get(key)迭代耗费的时间少，效率更高。

java 复制代码

Set<Map.Entry<String, String>> entries = hm.entrySet();

对ArrayList去重，常规写法耗费时间从少到多依次是HashMap < HashSet < TreeMap。

HashMap通过containsKey判断去重再添加进ArrayList
使用HashSet传入ArrayList对象时去重。
TreeMap通过containsKey判断去重再添加进ArrayList

例子

java 复制代码

package test.example;

import org.apache.log4j.Logger;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

import java.lang.reflect.Field;
import java.util.*;
import java.util.function.Consumer;

@RunWith(JUnit4.class)
public class TestCollection extends TestBase{

    private static Logger logger = Logger.getLogger(TestCollection.class);

    private final int kIterateCount = 100000;

    @Before
    public void setUp() {
        super.setUp(logger);
    }

    @After
    public void tearDown() {
        super.tearDown(logger);
    }

    protected int getListCapacity(List<?> list){
        try {
            // 获取 ArrayList 的 Class 对象
            Class<ArrayList> arrayListClass = ArrayList.class;
            // 获取名为 "elementData" 的私有字段
            Field elementDataField = arrayListClass.getDeclaredField("elementData");
            // 设置可访问私有字段
            elementDataField.setAccessible(true);
            // 获取该字段的值（即底层数组）
            Object[] elementData = (Object[]) elementDataField.get(list);
            // 数组的长度即为当前容量
            return elementData.length;
        } catch (NoSuchFieldException | IllegalAccessException e) {
            e.printStackTrace();
        }
        return 0;
    }

    @Test
    public void testInitialCapacity(){
        // 指定初始容量，提升性能;
        var first = new ArrayList<String>(10000);

        long start = System.nanoTime();
        for(int i = 0; i < 10000; ++i){
            first.add("hello");
        }
        long end = System.nanoTime();
        logger.info("first duration: " + (end-start));
        logger.info("first items size is : " + first.size());
        logger.info("first items capacity is : " + getListCapacity(first));

        var second = new ArrayList<String>();
        start = System.nanoTime();
        for(int i = 0; i < 10000; ++i){
            second.add("hello");
        }
        end = System.nanoTime();
        logger.info("second duration: " + (end-start));
        logger.info("second items2 size is :" + second.size());
        logger.info("second items2 capacity is :" + getListCapacity(second));
    }


    @Test
    public void testIteratorHashMap(){
        // HashMap因为key做了hash,查询值的时间是O(1)的复杂度。所以它的。get(key)方法很快，
        // 枚举key/value使用`keySet`不比`entrySet`慢多少。甚至比`EntrySet`更快
        HashMap<String,String> hm = new HashMap<>();
        iteratorMap(hm);
    }

    @Test
    public void testIteratorTreeMap(){
        // 使用红黑树，查询值的时间复杂度是O(log(n)).因此数据越多，查询值的时间就会增加。枚举使用`keySet`
        // 就会比`EntrySet`慢很多。
        TreeMap<String,String> hm = new TreeMap<>();
        iteratorMap(hm);
    }

    @Test
    public void testIteratorMap(){
        logger.info("===================== testIteratorHashMap =========================");
        testIteratorHashMap();

        logger.info("===================== testIteratorTreeMap =========================");
        testIteratorTreeMap();
    }

    public void iteratorMap(Map<String,String> hm){


        for(int i = 0; i< 1000000L; ++i){
            hm.put("website-"+i,"https://blog.csdn.net/infoworld");
        }

        Set<Map.Entry<String, String>> entries = hm.entrySet();

        StringBuilder sb = new StringBuilder();
        var start = startRecord();
        for(var item: entries){
            sb.append(item.getKey());
            sb.append(item.getValue());
        }
        var end = endRecord();
        pDuration(logger,start,end,"EntrySet");
//        logger.info(sb.toString());

        var keys = hm.keySet();
        sb = new StringBuilder();
        start = startRecord();
        for(var key: keys){
            sb.append(key);
            sb.append(hm.get(key));
        }
        end = endRecord();
        pDuration(logger,start,end,"KeySet");
//        logger.info(sb.toString());
    }

    @Test
    public void testRemoveDulplicationTreeMap(){

        Consumer<List<Integer>> func = lists->{
            for(int i = 0; i< kIterateCount; ++i){
                var value = (int)(Math.random()*100);
                lists.add(value);
            }
        };
        List<Integer> lists = new ArrayList<>();
        long count = 0;
        // 使用HashMap来去重
        logger.info("===================== TreeMap =========================");
        for(int i = 0; i< 1000; ++i){
            lists.clear();
            func.accept(lists);
            var map = new TreeMap<Integer,Integer>();
            var first = startRecord();
            for(var one: lists){
                if(!map.containsKey(one))
                    map.put(one,one);
            }

            lists.clear();
            var keys = map.keySet();
            for(var key : keys)
                lists.add(key);
            var second = endRecord();
            count = (count > 0)?(count + second - first)/2:(second - first);
        }

        logger.info("hashmap duration: "+ count);
//        for(var one : lists)
//            logger.info("number is: "+one);
//
//
//        for(var one : lists)
//            logger.info("number is: "+one);

    }

    @Test
    public void testRemoveDulplicationHashMap(){

        Consumer<List<Integer>> func = lists->{
            for(int i = 0; i< kIterateCount; ++i){
                var value = (int)(Math.random()*100);
                lists.add(value);
            }
        };
        List<Integer> lists = new ArrayList<>();
        long count = 0;
        // 使用HashMap来去重
        logger.info("===================== HashMap =========================");
        for(int i = 0; i< 1000; ++i){
            lists.clear();
            func.accept(lists);
            var map = new HashMap<Integer,Integer>();
            var first = startRecord();
            for(var one: lists){
                if(!map.containsKey(one))
                    map.put(one,one);
            }

            lists.clear();
            var keys = map.keySet();
            for(var key : keys)
                lists.add(key);
            var second = endRecord();
            count = (count > 0)?(count + second - first)/2:(second - first);
        }

        logger.info("hashmap duration: "+ count);
//        for(var one : lists)
//            logger.info("number is: "+one);
//
//
//        for(var one : lists)
//            logger.info("number is: "+one);

    }

    @Test
    public void testRemoveDulplicationSet(){

        Consumer<List<Integer>> func = lists->{
            for(int i = 0; i< kIterateCount; ++i){
                var value = (int)(Math.random()*100);
                lists.add(value);
            }
        };
        List<Integer> lists = new ArrayList<>();
        long count = 0;
        // 使用Set来去重
        logger.info("=================== Set ===========================");
        for(int i = 0; i< 1000; ++i){
            lists.clear();
            
            func.accept(lists);
            var sets = new HashSet<Integer>(lists);
            var first1 = startRecord();
            lists.clear();
            lists.addAll(sets);
            var second1 = endRecord();

            count = (count > 0)?(count + second1 - first1)/2:(second1 - first1);
        }

        logger.info("set duration: "+ count);
//        for(var one : lists)
//            logger.info("number is: "+one);
    }

    @Test
    public void testRemoveDulpication(){

        // 1. 使用HashSet来去重,速度并不比HashMap快,但是比TreeMap快很多。
        logger.info("Time unit is nanosecond!");
        testRemoveDulplicationHashMap();
        testRemoveDulplicationTreeMap();
        testRemoveDulplicationSet();
    }
}

输出

testRemoveDulpication移除重复元素测试，元素个数有100000个，循环测试1000次求平均值。分别用HashMap,TreeMap和HashSet测试速度对比。

0 [main] INFO test.example.TestCollection - Time unit is nanosecond!
0 [main] INFO test.example.TestCollection - ===================== HashMap =========================
2101 [main] INFO test.example.TestCollection - hashmap duration: 278803
2101 [main] INFO test.example.TestCollection - ===================== TreeMap =========================
7361 [main] INFO test.example.TestCollection - hashmap duration: 3430520
7361 [main] INFO test.example.TestCollection - =================== Set ===========================
10182 [main] INFO test.example.TestCollection - set duration: 356707
testInitialCapacity初始化10000个容量，之后添加10000个元素和不初始化容量添加10000个元素测试速度对比。

0 [main] INFO test.example.TestCollection - first duration: 171200
6 [main] INFO test.example.TestCollection - first items size is : 10000
6 [main] INFO test.example.TestCollection - first items capacity is : 10000
6 [main] INFO test.example.TestCollection - second duration: 214400
6 [main] INFO test.example.TestCollection - second items2 size is :10000
6 [main] INFO test.example.TestCollection - second items2 capacity is :14053
testIteratorMap枚举1000000个元素，分别用entrySet获取key,value值和通过keySet先获取key再获取value测试速度对比。

0 [main] INFO test.example.TestCollection - ===================== testIteratorHashMap =========================
191 [main] INFO test.example.TestCollection - EntrySet duration: 58433200
256 [main] INFO test.example.TestCollection - KeySet duration: 61964700
256 [main] INFO test.example.TestCollection - ===================== testIteratorTreeMap =========================
635 [main] INFO test.example.TestCollection - EntrySet duration: 85129200
837 [main] INFO test.example.TestCollection - KeySet duration: 200515700

参考

阿里巴巴Java开发手册