扰动函数
为什么使用扰动函数
增加随机性,让元素散列均匀,减少碰撞。
源码分析
看下hashMap计算hash的源码:
java
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
可以看到,计算hash时,使用hash与右移16位的hash做了异或运算。16位正好是自己二进制长度的一半,之后与原hash的做异或运算,这样就混合了原hash中的高位和低位,增大了随机性。
然后再用这个二进制数字与map容量减一进行与运算,就得到了这个key应该存放的位置。
初始化容量
先说总则:初始化容量只能是2的n次幂,如果声明不是,则自动转换为大于声明容量的最小的2的n次幂。
先看源码:
java
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
可以看到在初始化容量时,如果容量不小于0并且没有达到最大容量,则调用tableSizeFor()方法。
tableSizeFor()源码如下:
java
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
|= 运算符相当于"或等",即两个数某一位有一个是1即为1。所以这一串操作下来,其实是把传入的容量cap,变成了111......,正是大于cap的最小的2的几次幂-1,最后返回n+1,就正好是2的几次幂了。
比如传入的是17,二进制为10001
,依次的执行结果如下:
int n = cap - 1 = 10000
n |= n >>> 1 = 11000
n |= n >>> 2 = 11110
n |= n >>> 4 = 11111
n |= n >>> 8; //不需要
n |= n >>> 16; //不需要
这样就得到了11111,即31,最后返回n + 1即32。
为什么一定要是2的n次幂呢?
这就与上面的扰动函数关联起来了。2的n次幂减一正好是11111......这样的形式,与扰动函数的hash进行与运算,可以使散列更加均匀,减少碰撞。
负载因子
java
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
这是HashMap默认的负载因子,0.75,当使用容量达到75%时,map就会自动扩容。
通过上面的构造函数可以看出来,这个负载因子我们也可以通过构造函数在创建map的时候传进去。
负载因子越小,就越不容易产生碰撞,map的性能也就越好。所以如果希望用空间换时间,可以把负载因子设置的小一些。
扩容元素拆分
map进行扩容后,原来的元素就要拆分到新的map中。JDK1.7时,需要重新计算hash值,比较费时。而JDK8中进行了优化,不再需要重新计算hash值了。
那么JDK8是如何进行拆分的呢?
java
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
这是map扩容方法(resize())中的一段代码,大概逻辑如下:
是否有下一个节点(链表或者红黑树),如果没有,即非链表非树,直接用hash & 新的容量计算出下标;
如果有下一个节点,是树结构,则会走树的一套逻辑,这里就不赘述了,里面拆分的方式见下面的链表;
否则,就是说链表结构,就会计算hash & 原容量
,计算出的值如果为0,则不变;否则,元素下标变为原下标+原容量。
CRUD
插入
java
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
// 计算hash并调用putVal()
return putVal(hash(key), key, value, false, true);
}
put中直接调用了putVal,我们看putVal的源码:
java
/**
* Implements Map.put and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
// 初始化一些待使用的变量,包括临时表tab、当前节点p、容量n、当前节点下标i
Node<K,V>[] tab; Node<K,V> p; int n, i;
// 如果表为空,则初始化容量
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// 如果当前节点为null,即不存在其他元素,则直接初始化元素放入节点即可
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
// 当前节点有元素的情况
else {
Node<K,V> e; K k;
// key相同,替换值
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// key不同,当前节点为树节点,则调用putTreeVal()方法,将当前元素插入红黑树
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
// key不同,当前节点为链表
else {
// 遍历当前元素应该插入的位置,binCount为链表长度
for (int binCount = 0; ; ++binCount) {
// 当前节点的最后一个元素,该插入到此
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// 如果链表长度大于等于树化阈值,则调用treeifyBin()转化为红黑树(这里并不会直接转为红黑树,在treeifyBin()方法中,如果map容量不到64,则先扩容,否则才转化为红黑树)
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// 还是key相同替换值的情况
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
// 在这里把相同key的旧值替换掉了
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// 如果已使用容量超过扩容阈值,则进行扩容
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
查找
java
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
计算hash后调用了getNode方法:
java
/**
* Implements Map.get and related methods.
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
// 声明一些待使用变量
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// 如果map不为空 且 容量大于0 且 计算出该key位置的元素不为null
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
// 如果取出该位置的元素的hash与当前key的hash相同 且 key相同
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
// 返回取出的元素
return first;
// 有下一个元素,即链表或树结构
if ((e = first.next) != null) {
// 树结构
if (first instanceof TreeNode)
// 遍历红黑树获取元素
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// 非树结构,遍历链表获取元素
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
删除
java
/**
* Removes the mapping for the specified key from this map if present.
*
* @param key key whose mapping is to be removed from the map
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
public V remove(Object key) {
Node<K,V> e;
return (e = removeNode(hash(key), key, null, false, true)) == null ?
null : e.value;
}
依然是计算hash后调用removeNode()方法:
java
/**
* Implements Map.remove and related methods.
*
* @param hash hash for key
* @param key the key
* @param value the value to match if matchValue, else ignored
* @param matchValue if true only remove if value is equal
* @param movable if false do not move other nodes while removing
* @return the node, or null if none
*/
final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
// 声明一些待使用变量
Node<K,V>[] tab; Node<K,V> p; int n, index;
// map不为空 且 当前节点不为空
if ((tab = table) != null && (n = tab.length) > 0 &&
(p = tab[index = (n - 1) & hash]) != null) {
Node<K,V> node = null, e; K k; V v;
// hash和key相等,则元素为当前节点
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
node = p;
// 否则,如果为链表或树结构
else if ((e = p.next) != null) {
// 如果为树结构
if (p instanceof TreeNode)
// 遍历并查找元素
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else {
// 链表结构,遍历查找元素
do {
if (e.hash == hash &&
((k = e.key) == key ||
(key != null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while ((e = e.next) != null);
}
}
// 删除元素
if (node != null && (!matchValue || (v = node.value) == value ||
(value != null && value.equals(v)))) {
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
else if (node == p)
tab[index] = node.next;
else
p.next = node.next;
++modCount;
--size;
afterNodeRemoval(node);
return node;
}
}
return null;
}
完。