2013-03-14

Java

javap

javap - The Java Class File Disassembler

Java Class文件反编译工具

C:\Users\Administrator>javap -help
Usage: javap <options> <classes>
where possible options include:
  -help  --help  -?        Print this usage message
  -version                 Version information 
  -v  -verbose             Print additional information #打印详细的信息
  -l                       Print line number and local variable tables
  -public                  Show only public classes and members
  -protected               Show protected/public classes and members
  -package                 Show package/protected/public classes
                           and members (default)
  -p  -private             Show all classes and members
  -c                       Disassemble the code
  -s                       Print internal type signatures
  -sysinfo                 Show system info (path, size, date, MD5 hash)
                           of class being processed
  -constants               Show static final constants
  -classpath <path>        Specify where to find user class files
  -bootclasspath <path>    Override location of bootstrap class files

参考资料

javap - The Java Class File Disassembler

2013-03-08

Solr Lucene

solr 4.1 install guide

1.从Solr的官方网站下载Solr 4.1的安装包.

2.解压solr-4.1.0.zip 到安装目录。

3.将 solr-4.1.0目录下的example复制成目标索引项目的名称 cloudatlas.

4.进行 solr-4.1.0\cloudatlas\solr目录，复制 collection1 为索引文档的名称。
collection1 –> photos, users.

5.在 solr-4.1.0\cloudatlas\solr\solr.xml文件中，配置两个core 实例。

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores defaultCoreName="photos" adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="8983" hostContext="solr">
    <core name="photos" loadOnStartup="true" instanceDir="photos\" transient="false" />
    <core name="users"  loadOnStartup="true" instanceDir="users\" transient="false" />
  </cores>
</solr>

6.配置IK中文分词。

在 [googlecode](https://code.google.com/p/ik-analyzer/) 下载IK分词器的 jar 包。    在 solr-4.1.0\cloudatlas\solr 下创建 lib 目录，并将下载的IK分词器jar放到该目录中。

在 solr-4.1.0\cloudatlas\solr\photos\conf\solrconfig.xml 配置lib 目录  

<lib dir="../lib" />

在 solr-4.1.0\cloudatlas\solr\photos\conf\schema.xml 配置中文分词的field_type.

 <fieldType name="text_cn" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
  </analyzer>
</fieldType>


采用中文分词的field配置：

    <field name="photo_desc" type="text_cn" indexed="true" stored="true"/>

7.在solr.xml 中注释 elevator 的配置（因为我们的schema 字段配置与 elevate中的字段配置有冲突）

<!-- Query Elevation Component

     http://wiki.apache.org/solr/QueryElevationComponent

     a search component that enables you to configure the top
     results for a given query regardless of the normal lucene
     scoring.
  -->
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
  <!-- pick a fieldType to analyze queries -->
  <str name="queryFieldType">string</str>
  <str name="config-file">elevate.xml</str>
</searchComponent>

<!-- A request handler for demonstrating the elevator component -->
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <str name="df">text</str>
  </lst>
  <arr name="last-components">
    <str>elevator</str>
  </arr>
</requestHandler>

8.配置索引的文档schema.

photos的schema.xml 配置

<?xml version="1.0" encoding="UTF-8" ?>    
<schema name="photos" version="1.5">

 <fields>        
   <field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="user_id" type="long" indexed="true" stored="true" omitNorms="true" />   
   <field name="folder_id" type="long" indexed="true" stored="true" omitNorms="true" />   
   <field name="folder_name" type="text_cn" indexed="true" stored="true"/>
   <field name="folder_desc" type="text_cn" indexed="true" stored="true"/>
   <field name="photo_desc" type="text_cn" indexed="true" stored="true"/>
   <field name="farm" type="string" indexed="false" stored="true" omitNorms="true"/>
   <field name="bucket" type="string" indexed="false" stored="true" omitNorms="true"/>
   <field name="storage_keys" type="string" indexed="false" stored="true" omitNorms="true"/>
   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="text" type="text_cn" indexed="true" stored="false" multiValued="true"/>
   <field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />   
 </fields>


 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>

   <copyField source="folder_name" dest="text"/>
   <copyField source="folder_desc" dest="text"/>
   <copyField source="photo_desc" dest="text"/>  

  <types>

    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

    <!-- boolean type: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="text_cn" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
      </analyzer>
    </fieldType>
 </types>

    <!-- field for the QueryParser to use when an explicit fieldname is absent -->
    <defaultSearchField>text</defaultSearchField>

    <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
    <solrQueryParser defaultOperator="OR"/>

</schema>

users的schema 配置

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="users" version="1.5">

 <fields>        
   <field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="nickname" type="text_cn" indexed="true" stored="true"/>
   <field name="text" type="string" indexed="true" stored="false" multiValued="true"/>
   <field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />
 </fields>


 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>

<!--
   <copyField source="folder_name" dest="text"/>
   <copyField source="folder_desc" dest="text"/>
   <copyField source="photo_desc" dest="text"/>  
 -->

  <types>
    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

    <!-- boolean type: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="text_cn" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
      </analyzer>
    </fieldType>
 </types>

     <!-- field for the QueryParser to use when an explicit fieldname is absent -->
    <defaultSearchField>nickname</defaultSearchField>

    <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
    <solrQueryParser defaultOperator="OR"/>

</schema>

10.启动solr 服务

java -jar start.jar

参考资料

Solr Tutorial

2013-03-05

Java JVM

java performance diagnosis

top 命令

top -H -p pid 列出指定线程中的所有线程，可以查看占用cpu最多的线程。

jstack pid > thread.dump 从thread dump 中找到占用cpu最高的线程，线程id要转化成16进制。

2013-03-05

Java

Java基础：HashMap

基于哈希表的 Map 接口的实现。此实现提供所有可选的映射操作，并允许使用 null 值和 null 键。（除了非同步和允许使用 null 之外，HashMap 类与 Hashtable 大致相同。）

HashMap的数据成员

/**
 * The default initial capacity - MUST be a power of two.
 */
//  默认的初始容量大小，必须为2的整数倍
static final int DEFAULT_INITIAL_CAPACITY = 16;

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
// 最大容量大小
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
 * The load factor used when none specified in constructor.
 */
// 默认的加载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
// 用来存储entry项的数组（即哈希桶）
transient Entry[] table;

/**
 * The number of key-value mappings contained in this map.
 */
// 存放的entry个数
transient int size;

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
// 阈值 = 容器 * 加载因子，达到该值时进行resize扩容操作
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
// 加载因子
final float loadFactor;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
// 累计HashMap结构修改的次数，用来迭代器在判断并发修改时快速失败
transient int modCount;

HashMap的默认构造函数

public HashMap() {
    // 采用默认值进行构造
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
    table = new Entry[DEFAULT_INITIAL_CAPACITY];
    init();
}

// 用来给子类在构造后进行拓展的方法
void init() {
}

hash方法

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

/**
 * Returns index for hash code h.
 */
// 用来定位把key 放到哪个hash桶中
static int indexFor(int h, int length) {
    // 等价于 h % (length -1) 
    return h & (length-1);
}

put方法

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    // key 为null
    if (key == null)
        return putForNullKey(value);

    // 获取key的hash值
    int hash = hash(key.hashCode());
    // 根据hash值计算当前key 对应的hash桶index
    int i = indexFor(hash, table.length);

    // 遍历hash桶的链表
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;

        // 判断hash值相等，并且 (是同一个key 或 key.equals(k) ）
        // 这也就是为什么把对象放进HashMap时，需要实现 hashCode 与 equals方法
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            // 该key已存在，则更新value，并返回记录oldValue
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);// 记录该项被访问过
            return oldValue;
        }
    }

    // 修改数加1
    modCount++; // modCount不是volatile修饰的，且自增操作无法保证线程安全！！！

    // key不存在Map中，则添加该项
    addEntry(hash, key, value, i);
    return null;
}

/**
 * Offloaded version of put for null keys
 */
private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

addEntry方法

// 添加一个Entry
void addEntry(int hash, K key, V value, int bucketIndex) {
    // 获取相应的hash桶
    Entry<K,V> e = table[bucketIndex];
    // 修改hash桶的head头指针为新建的Entry
    // 旧的head则作为构造函数传入Entry,新建Entry的next将指向旧的head.
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);

    if (size++ >= threshold)
        // 如果大小大于或等于阈值，则进行resize重新hash操作
        resize(2 * table.length);// 增长为原来大小的2倍

    // 如果恶意制造大量相同hash桶index的值，则会将HashMap退化为链表
    // 从而产生HashMap碰撞攻击
}

Entry相当于链表中的一个节点（Node）
Entry类的成员与构建函数:

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;

    /**
     * Creates new entry.
     */
    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }

resize方法

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    // 如果已经达到最大容量上限，则调整threshold为整数的最大值，然后返回
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    // 创建一个新容器大小的hash桶
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable);
    table = newTable;
    // 重新计算阈值
    threshold = (int)(newCapacity * loadFactor);
}

/**
 * Transfers all entries from current table to newTable.
 */
// 迁移所有的旧hash桶的entry到新的hash桶内
void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    // 遍历每个hash桶
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        // hash链表不为空
        if (e != null) {
            src[j] = null;// 将src的hash桶的头指针置空

            do {// 遍历hash链表

                Entry<K,V> next = e.next;
                // 计算在新hash桶中的桶索引
                int i = indexFor(e.hash, newCapacity);
                // 与addEntry一样，修改Entry的next
                e.next = newTable[i];
                // 使hash桶的头指针指向entry
                newTable[i] = e;

                e = next;// 遍历下一项
            } while (e != null);
        }
    }
}

迭代器的实现

public Set<Map.Entry<K,V>> entrySet() {
    return entrySet0();
}

private Set<Map.Entry<K,V>> entrySet0() {
    Set<Map.Entry<K,V>> es = entrySet;
    return es != null ? es : (entrySet = new EntrySet());
}

private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public Iterator<Map.Entry<K,V>> iterator() {
        return newEntryIterator();
    }
    public boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<K,V> e = (Map.Entry<K,V>) o;
        Entry<K,V> candidate = getEntry(e.getKey());
        return candidate != null && candidate.equals(e);
    }
    public boolean remove(Object o) {
        return removeMapping(o) != null;
    }
    public int size() {
        return size;
    }
    public void clear() {
        HashMap.this.clear();
    }
}

Iterator<Map.Entry<K,V>> newEntryIterator()   {
    return new EntryIterator();
}

private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
    public Map.Entry<K,V> next() {
        return nextEntry();
    }
}


private abstract class HashIterator<E> implements Iterator<E> {
    Entry<K,V> next;        // next entry to return
    int expectedModCount;   // For fast-fail
    int index;              // current slot
    Entry<K,V> current;     // current entry

    HashIterator() {
        // 初始化为保存当前的modCount
        expectedModCount = modCount;
        if (size > 0) { // advance to first entry
            Entry[] t = table;
            // 遍历hash桶，获取第一个不为空的hash桶
            while (index < t.length && (next = t[index++]) == null)
                ;
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Entry<K,V> nextEntry() {
        // 如果expectedModCount 与 modCount 不一致，则说明其它地方对Map进行了修改
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();

        Entry<K,V> e = next;
        if (e == null)
            throw new NoSuchElementException();

        // 如果当前项的next为空，则找到一个不为空的项，如果没有则为null
        if ((next = e.next) == null) {
            Entry[] t = table;
            while (index < t.length && (next = t[index++]) == null)
                ;
        }

        // 返回当前项
        current = e;
        return e;
    }

    public void remove() {
        if (current == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        Object k = current.key;
        current = null;
        HashMap.this.removeEntryForKey(k);
        expectedModCount = modCount;
    }

}

注意事项

HashMap不是线程安全的。
迭代器的快速失败行为不能得到保证。
一般来说，存在非同步的并发修改时，不可能作出任何坚决的保证。快速失败迭代器尽最大努力抛出 ConcurrentModificationException。因此，编写依赖于此异常的程序的做法是错误的，正确做法是：迭代器的快速失败行为应该仅用于检测程序错误。
如果你有一个已知大小的HashMap,初始化时最好带上容量参数，以避免频繁进行resize操作。
HashMap碰撞拒绝服务漏洞
Apache的方案是在Tomcat中增加一个新的选项maxParameterCount，用来限制单个请求中的最大参数量。参数默认值设为10000，确保既不会对应用程序造成影响（对多数应用来说已经足够），也足以减轻DoS攻击的压力。

####参考资料

Apache曝HashTable碰撞拒绝服务漏洞，Java、PHP、Asp.Net及v8引擎等都受影响

2013-03-02

Java

Java中的对象初始化

1. 一般类的初始化顺序

加载类
设置所有的静态成员为默认值(0,false,0.0 etc.)，引用类型初始化null
按照在类声明中出现的次序，依次执行静态成员或静态块的初始化。静态初始化只在Class对象首次加载的时候进行一次。
new Class() 创建一个实例对象时,首先在堆上为实例对象分配足够的存储空间。
这块存储空间会清零，自动将实例对象的所有基本类型数据都设置成默认值，而引用类型被设置为null.
初始化类的实例成员，实例成员间的初始化顺序取决于在类的中声明顺序。
执行构造方法

2. 继承结构的初始化顺序

先执行父类的静态成员初始化
执行子类的静态成员初始化
执行父类的实例成员初始化
再执行父类的构造函数
执行子类的实例成员初始化
执行子类的构造函数

code:

class Who{

    public Who(String name){
        System.out.println(name + " is here!");
    }

}



class Child extends Parent{

    Who w3 = new Who("Child instance");

    static{
        System.out.println("Child static block 1");
    }

    static Who w4 = new Who("Child static");

    static{
        System.out.println("Child static block 2");
    }

    public Child(){
        System.out.println("Child construct");
    }

}


class Parent{

    Who w2 = new Who("Parent instance");

    static{
        System.out.println("Parent static block 1");
    }

    static Who w1 = new Who("Parent static");

    static{
        System.out.println("Parent static block 2");
    }

    public Parent(){
        System.out.println("Parent construct");
    }

}

public class Test {
    public static void main(String[] args) {
        new Child();
    }
}

输出：

Parent static block 1
Parent static is here!
Parent static block 2
Child static block 1
Child static is here!
Child static block 2
Parent instance is here!
Parent construct
Child instance is here!
Child construct

3. 类的静态块中，启动一个线程（线程中包含类的静态变量）

class Foo {

    public Foo(int i) {
        System.out.println("foo" + i + " construct ");
    }
}

class Example {

    static {
        System.out.println("static block 1 ");
        // compile error
        // System.out.println("Print " + foo);

        new Thread(new Runnable() {

            @Override
            public void run() {
                System.out.println("Thread1 ");
                // System.out.println("Thread run whith foo2 " + foo2);
            }
        }).start();
    }


    static {
        System.out.println("static block 2 ");

        new Thread(new Runnable() {

            @Override
            public void run() {
                System.out.println("Thread2 run whith foo " + foo);
                // System.out.println("Thread run whith foo2 " + foo2);
            }
        }).start();

        try {
            System.out.println("main thread sleep ");
            Thread.sleep(1000 * 5);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    static Foo foo = new Foo(1);

    final Foo foo2 = new Foo(2);

    static {
        System.out.println("static block 3 ");
    }

    public Example() {
        System.out.println("Example construct");
    }

}

public class Test2 {

    public static void main(String[] args) {
        new Example();
    }
}

输出：

static block 1 
static block 2 
main thread sleep 
Thread1 
foo1 construct 
static block 3 
foo2 construct 
Example construct
Thread2 run whith foo Foo@10c042ab

结论：

可见如果静态块中创建的线程，如果引用类的静态成员，会等到类的静态成员初始化完成后再执行该线程。

#####5. stackoverflow 上的奇葩问题

import java.util.ArrayList;
import java.util.List;

public class Test3 {
    public static void main(String[] args) {
        SomeClass.getInstance();
    }
}

class SomeClass {

    private static final SomeClass instance = new SomeClass();

    public static SomeClass getInstance() {
        return instance;
    }

    static {
        System.out.println("Someclass static init");
    }
    private static String objectName1 = "test1";
    private static String objectName2 = "test2";

    @SuppressWarnings("serial")
    private List<SomeObject> list = new ArrayList<SomeObject>() {
        {
            add(new SomeObject(objectName1));
            add(new SomeObject(objectName2));
        }
    };

    public SomeClass(){
        System.out.println("some class construct");
        System.out.println("list is " + list);
    }
}

class SomeObject {

    String name;

    SomeObject(String name) {
        this.name = name;
        System.out.println("my name is:" + name);
    }
}

输出：

my name is:null
my name is:null
some class construct
list is [SomeObject@780adb3f, SomeObject@10c042ab]
Someclass static init

牛人解答：

Static blocks are initialized in order (so you can rely on the ones above in the ones below). By creating an instance of SomeClass as your first static initializer in SomeClass, you’re forcing an instance init during the static init phase.

So the logical order of execution of your code is:

Load class SomeClass, all static fields initially defaults (0, null, etc.)

Begin static inits

First static init creates instance of SomeClass

Begin instance inits for SomeClass instance, using current values for static fields (so objectName1 and objectName2 are null)

Load SomeObject class, all static fields initially default (you don’t have any)

Do SomeObject static inits (you don’t have any)

Create instances of SomeObject using the passed-in null values

Continue static inits of SomeClass, setting objectName1 and objectName2

To make this work as you may expect, simply put the inits for objectName1 and objectName2 above the init for instance.

As suggested moving this line:

private static final SomeClass  instance    = new SomeClass();

after these:

private  static String objectName1  ="test1";
private  static String objectName2  ="test2";

should fix the problem.

#####6. 多线程初始化

类的初始化阶段是执行类构造器<clinit>()方法的过程。

<clinit>()方法是由编译器自动收集类中的所有类变量的赋值动作和静态语句块（static{}块）中的语句合并产生的，编译器收集的顺序是由语句在源文件中出现的顺序决定。
虚拟机会保证在子类的<clinit>()方法执行之前，父类的<clinit>()已经执行完毕。
由于父类的<clinit>()方法先执行，也就意味着父类中定义的静态语句块要优先于子类的变量赋值操作。
如果一个类没有静态语句块，也没有对变更的赋值操作，那么编译器可以不为这个类生成<clinit>()。
接口中不能使用静态语句块，但仍然有变量初始化的赋值操作，因此接口与类一样会生成<clinit>()方法。
虚拟机会操作一个类的<clinit>()方法在多线程环境中被正确的回销和同步。多个线程同时初始化一个对象，那么只会有一个线程去执行这个类的<clinit>()方法。

示例：

public class DeadLoopDemo{

    static{

        if(true){
            System.out.println(Thread.currentThread() + "init DeadLoopDemo");
            while (true) {

            }
        }
    }

    public static void main(String[] args) {

        Runnable runable = new Runnable() {

            @Override
            public void run() {
                // TODO Auto-generated method stub
                System.out.println(Thread.currentThread() + "start");
                DeadLoopDemo d1 = new DeadLoopDemo();
                System.out.println(Thread.currentThread() + "stop");

            }
        };

        Thread t1 = new Thread(runable);
        Thread t2 = new Thread(runable);
        t1.start();
        t2.start();
    }
}

####参考资料

Object Initialization in Java
java-constructor-order - Stack Overflow
Thinking in Java
《深入理解Java虚拟机》——周志明著

2012-12-24

DevTools Eclipse

eclipse install guide

Eclipse官网下载最新版的Eclipse JavaEE 版本安装。

默认自带插件：

Maven Integration for Eclipse
EGit - Git Team Provider

必备插件

Subversive - SVN Team Provider
EasyShell
AnyEdit Tools
Gradle Integration for Eclipse

可选

Findbugs Eclipse Plugin
CheckStyle

其它
scala插件 http://scala-ide.org/download/current.html

参考资料

27 Best Free Eclipse Plug-ins for Java Developer to be Productive

2012-12-24

Java JVM

java vm args

1.java 标准参数

C:\Users\Administrator>java -h
用法: java [-options] class [args...]
           (执行类)
   或  java [-options] -jar jarfile [args...]
           (执行 jar 文件)
其中选项包括:
    -d32          使用 32 位数据模型 (如果可用)
    -d64          使用 64 位数据模型 (如果可用)
    -server       选择 "server" VM
    -hotspot      是 "server" VM 的同义词 [已过时]
                  默认 VM 是 server.

    -cp <目录和 zip="" jar="" 文件的类搜索路径="">
    -classpath <目录和 zip="" jar="" 文件的类搜索路径="">
                  用 ; 分隔的目录, JAR 档案
                  和 ZIP 档案列表, 用于搜索类文件。
    -D=
                  设置系统属性
    -verbose[:class|gc|jni]
                  启用详细输出
    -version      输出产品版本并退出
    -version:
                  需要指定的版本才能运行
    -showversion  输出产品版本并继续
    -jre-restrict-search | -no-jre-restrict-search
                  在版本搜索中包括/排除用户专用 JRE
    -? -help      输出此帮助消息
    -X            输出非标准选项的帮助
    -ea[:...|:]
    -enableassertions[:...|:]
                  按指定的粒度启用断言
    -da[:...|:]
    -disableassertions[:...|:]
                  禁用具有指定粒度的断言
    -esa | -enablesystemassertions
                  启用系统断言
    -dsa | -disablesystemassertions
                  禁用系统断言
    -agentlib:[=]
                  加载本机代理库 , 例如 -agentlib:hprof
                  另请参阅 -agentlib:jdwp=help 和 -agentlib:hprof=help
    -agentpath:[=]
                  按完整路径名加载本机代理库
    -javaagent:[=]
                  加载 Java 编程语言代理, 请参阅 java.lang.instrument
    -splash:
                  使用指定的图像显示启动屏幕
有关详细信息, 请参阅 http://www.oracle.com/technetwork/java/javase/documentation/index.html。

#### 2. java 非标准参数

C:\Users\Administrator>java -X
    -Xmixed           混合模式执行 (默认)
    -Xint             仅解释模式执行
    -Xbootclasspath:<用 ;="" 分隔的目录和="" zip="" jar="" 文件="">
                      设置搜索路径以引导类和资源
    -Xbootclasspath/a:<用 ;="" 分隔的目录和="" zip="" jar="" 文件="">
                      附加在引导类路径末尾
    -Xbootclasspath/p:<用 ;="" 分隔的目录和="" zip="" jar="" 文件="">
                      置于引导类路径之前
    -Xdiag            显示附加诊断消息
    -Xnoclassgc       禁用类垃圾收集
    -Xincgc           启用增量垃圾收集
    -Xloggc:    将 GC 状态记录在文件中 (带时间戳)
    -Xbatch           禁用后台编译
    -Xms        设置初始 Java 堆大小
    -Xmx        设置最大 Java 堆大小
    -Xss        设置 Java 线程堆栈大小
    -Xprof            输出 cpu 配置文件数据
    -Xfuture          启用最严格的检查, 预期将来的默认值
    -Xrs              减少 Java/VM 对操作系统信号的使用 (请参阅文档)
    -Xcheck:jni       对 JNI 函数执行其他检查
    -Xshare:off       不尝试使用共享类数据
    -Xshare:auto      在可能的情况下使用共享类数据 (默认)
    -Xshare:on        要求使用共享类数据, 否则将失败。
    -XshowSettings    显示所有设置并继续
    -XshowSettings:all
                      显示所有设置并继续
    -XshowSettings:vm 显示所有与 vm 相关的设置并继续
    -XshowSettings:properties
                      显示所有属性设置并继续
    -XshowSettings:locale
                      显示所有与区域设置相关的设置并继续

-X 选项是非标准选项, 如有更改, 恕不另行通知。

3. java参数与默认值

使用 -XX:+PrintFlagsFinal 参数可以输出所有参数的名称及默认值。  
java -XX:+PrintFlagsFinal

4. JVM 常用的参数、Flags

参数：

-server       选择 "server" VM
-cp <目录和 zip/jar 文件的类搜索路径>
-classpath <目录和 zip/jar 文件的类搜索路径>
              用 ; 分隔的目录, JAR 档案
              和 ZIP 档案列表, 用于搜索类文件。
-D<name>=<value>
              设置系统属性
-verbose[:class|gc|jni]
              启用详细输出
-version      输出产品版本并退出
-agentlib:<libname>[=<options>]
              加载本机代理库 <libname>, 例如 -agentlib:hprof
              另请参阅 -agentlib:jdwp=help 和 -agentlib:hprof=help
-agentpath:<pathname>[=<options>]
              按完整路径名加载本机代理库
-javaagent:<jarpath>[=<options>]
              加载 Java 编程语言代理, 请参阅 java.lang.instrument

-Xloggc:<file>    将 GC 状态记录在文件中 (带时间戳)
-Xms<size>        设置初始 Java 堆大小
-Xmx<size>        设置最大 Java 堆大小
-Xmn<size>        设置Java堆 新生代大小
-Xss<size>        设置 Java 线程堆栈大小 (对应Flag:ThreadStackSize 默认1MB JDK1.5+)
-Xnoclassgc       禁用类垃圾收集

Flags:

-XX:PermSize 指定永久代大小
-XX:MaxPermSize 指定最大永久代大小
-XX:SurvivorRatio 新生代中Eden与Survivor 的比例

-XX:GCTimeRatio   GC时间占总时间的比率，仅使用Parallel Scavenge 收集器时有效
-Xnoclassgc       禁用类垃圾收集

-XX:+DisableExplicitGC 忽略来自System.gc()方法触发的垃圾收集。
-XX:+UseParNewGC      
-XX:+UseConcMarkSweepGC
-XX:+CMSPermGenSweepingEnabled
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:-CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:SoftRefLRUPolicyMSPerMB=0

-XX:+PrintClassHistogram
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-Xloggc:/opt/logs/<services>/gc.log

调试选项
-XX:+HeapDumpOnOutofMemoryError    
-XX:+PrintFlagsFinal

5. 一个标准的8核CPU的JVM服务配置

* 内存配置为2g，web服务推荐为4G，RMI服务如果没有大量数据缓存，推荐2G以上
* gc日志必须打开

-server
-Xms2048M
-Xmx2048M
-Xmn512M
-XX:PermSize=256M
-XX:MaxPermSize=256M
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=7
-Xss1m
-XX:GCTimeRatio=19
-Xnoclassgc
-XX:+DisableExplicitGC
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSPermGenSweepingEnabled
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:-CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:SoftRefLRUPolicyMSPerMB=0
-XX:+PrintClassHistogram
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-Xloggc:/opt/logs/<services>/gc.log

高级JVM参数

-XX:HeapDumpPath=/tmp/dis-search.hprof  
-XX:+HeapDumpOnOutOfMemoryError

####参考资料

2012-12-21

MongoDB

MongoDB MapReduce

MapReduce

db.collection.mapReduce()的用法：

db.collection.mapReduce(
                     <mapfunction>,
                     <reducefunction>,
                     {
                       out: <collection>,
                       query: <document>,
                       sort: <document>,
                       limit: <number>,
                       finalize: <function>,
                       scope: <document>,
                       jsMode: <boolean>,
                       verbose: <boolean>
                     }
                   )

坑：

MapReduce 在map funtion 中 emit 相应的key 只有一个文档命中的时候，不会执行reduce方法，需要在finailize 方法中进行处理。
reduce 方法针对相同的key 有可能多次执行，reduce 计算的结果只和当前传入的values 有关系。

现有有个表topic_photos结构如下：

{
  "_id" : ObjectId("50b223832115fb21eca4a485"),
  "created_at" : ISODate("2012-11-25T13:56:19.548Z"),
  "last_liked" : ISODate("2012-12-10T08:44:49.276Z"),
  "likes" : 6,
  "photo_id" : NumberLong("39967821345988608"),
  "topic_id" : NumberLong("38362556125028352"),
  "user_id" : NumberLong(12449)
}

现在我们要选出被likes最多的用户，以及该用户likes最多那张照片及该照片的likes 数。

public void groupPhotosByUser(long topicId, long startTime, long endTime){

    String map = " function(){ "
            + "        var key = this.user_id; " // 以用户的user_id 为 key
            + "        value={photo_id:this.photo_id,likes:this.likes,photo_count:1,created_at:ISODate()};" // value 包含photo_id,likes,photo_count 初始为1。
            + "        emit(key,value); "
            + "    };";

    String reduce = "  function (user_id, values){"
                            // map reduce 是并行执行的，所以每次执行的时候都要初始化result
                    + "        var result = {user_id:user_id,photo_id:0,photo_count:0,likes:0,created_at:Date()}; "
                    + "     for(var i=0; i<values.length; i++){ "
                    + "         value = values[i];              "
                    + "         if(value > result.likes){ " 
                        // 选中values中likes最多的对象，并记录likes数与 photo_id
                    + "             result.likes = value.likes; "
                    + "             result.photo_id = value.photo_id;  "
                    + "         } "
                    + "         result.photo_count += value.photo_count; "
                    // 在循环中累计photo_count, 这里不是取values.length!!! 因为values个数多的时候，是分多次map_reduce执行的。
                    // 例如有20个values,可能分别reduce 10个文档，再将两次reduce的结果再次reduce,这时如果直接获取 values.length 结果就为2了。
                    + "     }; "
                    + "      return result; "
                    + "}";

    String finalize = " function (user_id, result) { "
            + "        if(typeof(result.created_at) == 'undefined' ){ " 
                    // 用户只有一张topic_photo的情况，在这里进行初始化。
            + "            result = {user_id:user_id,photo_count:1,photo_id:result.photo_id,likes:result.likes,created_at:ISODate()}; "
            + "        } "
            + "      return result; "
            + "}";

    DBObject query = QueryBuilder.start("topic_id").is(topicId)
                    .put("created_at").greaterThanEquals(new Date(startTime))
                    .lessThanEquals(new Date(endTime))
                    .get();
    String outputCollection = String.format("topic_%s_users", topicId);

    DBCollection inputCollection = getMongoDB().getCollection("topic_photos");
    MapReduceCommand mapReduceCommand = new MapReduceCommand(inputCollection, map, reduce, outputCollection, OutputType.REPLACE, query);
    mapReduceCommand.setFinalize(finalize);
    inputCollection.mapReduce(mapReduceCommand);

    //TODO creat mongodb index
}

MongoDB的调试

Java程序中

static{

    // Enable MongoDB logging in general
    System.setProperty("DEBUG.MONGO", "true");

    // Enable DB operation tracing
    System.setProperty("DB.TRACE", "true");
}

利用printjson()函数，在MapReduce 的执行过程打印出执行日志

var map = function(){
    var key = this.user_id;
    var value = {photo_id:this.photo_id, likes:this.likes, photo_count:1, created_at:ISODate()};
    emit(key,value);
}

var reduce = function (user_id, values){
    if (user_id == 350278) {
        printjson(values);
    }

    var result = {user_id:user_id,photo_id:0,photo_count:0,likes:0,created_at:Date()};
    for(var i=0; i<values.length; i++){
        var value = values[i];
        if(value > result.likes){
            result.likes = value.likes;
            result.photo_id = value.photo_id;
        }
        result.photo_count += value.photo_count;
    };
    return result;
}

var finalize = function (user_id, result) {
    if(typeof(result.created_at) == 'undefined' ){
        result = {user_id:user_id,photo_count:1,photo_id:result.photo_id,likes:result.likes,created_at:ISODate()};
    }
    return result;
}

var query = {"topic_id" : 38362556125028352}

db.runCommand({"mapreduce" : "topic_photos" ,
          "map" : map, 
          "reduce" : reduce,
          "finalize": finalize,
          "query" : query,           
          "out" : { "replace" : "topic_38362556125028352_users"}, 
          "verbose" : true
        })

MongoDB控制台输出

从上图可以看到，最后一次reduce 时，是将前两次reduce 的结果再执行 reduce 操作，所以reduce 中的 photo_count 计数，不能依赖于 values.length, 而应该从传入的参数中获取。

利用Underscore.js 框架调试MapReduce方法

参见Debugging MapReduce in MongoDB

Tips

MongoDB 最分将不同的业务进行垂直切分，存储到不到的db中，这样分别在找出慢查询及定位问题的时候更清晰。
Pretty print in MongoDB shell: db.collection.find().pretty()

参考资料

2012-12-21

Java JVM

Java Profile

通常进行系统问题定位的时候，可以通过一些数据进行分析。数据包括：运行日志、异常堆栈、GC日志、线程的快照（threaddump/javacore文件）、堆转存快照(heapdump/hprof文件)等。

这里简单介绍一下Java性能诊断的常规工具。

1.jps (JVM Process Status Tool),显示指定系统内所有的Hotspot虚拟机进程。
用法：jsp -v

-m 输出虚拟进程启动时传递给主类main()函数的参数  
-l 输出主类全名，如果进行执行的是jar包，输出jar路径  
-v 输出虚拟机进程启动时JVM参数

2.jstat: 观察GC情况

jstat -gc pid 2000  
jstat -gcutil pid 2000

[jstat - Java Virtual Machine Statistics Monitoring Tool](http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html)

或者使用top, ps aux 查看总的内存占用情况

3.jinfo 查看与调整虚拟机的各项参数

    jinfo -h  
    Usage:  
        jinfo [option]   
            (to connect to running process)  
        jinfo [option]   
            (to connect to a core file)  
        jinfo [option] [server_id@]  
            (to connect to remote debug server) 

    where

4.jmap ,查看 heap 情况，如查看存活对象列表

jmap -histo:live pid|grep com.company|less

或者dump 内存来分析

jmap -dump:file=test.bin pid

jmap 查看Java堆的详细信息

jmap -heap pid

5.分析dump 堆文件，可以用jhat:

jhat test.bin

分析完成后可以用浏览器查看堆的情况。
还可以用 Eclipse Memory Analyzer (MAT)工作进行分析，或者IBM的Heap Analyzer.

6.jstack : Java堆栈跟踪工具

jstack pid > thread_dump 

jstack -l 除了堆栈以外显示关于锁的附加信息

7.jvisualvm 和 jconsole, JVM 自带的图形化工具

8.Btrace

####参考资料

《深入理解Java虚拟机》- 周志明著第4章
Java程序员常用工具集 - 庄周梦蝶
Java 6 JVM参数选项大全（中文版）
JDK Tools and Utilities
Command-Line Options - Troubleshooting Guide for HotSpot VM
JDK Troubleshooting Guide
Java HotSpot VM Options

2012-12-19

Unicode Python

unicode in python

在python里，你可能会经常看到如下错误：

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xd6 in position 0: ordinal not in range(128)

一、默认编码 defaultencoding

python 在不同的平台上所采用的默认编码是不一样的。可以通过 sys.getdefaultencoding() 来获取。

import sys
print sys.getdefaultencoding()

windows 平台上的默认编码为utf8,linux平台上的默认编码是ascii。

设置默认编码

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

二、字符串string 与 unicode

#####1. utf-8 的字符串

>>>"中文"
'\xe4\xb8\xad\xe6\x96\x87'

>>> type("中文")
&lt;type 'str'&gt;

>>> len("中文")
6

中文的utf8编码，每个汉字占用3个byte,两个汉字的长度为6.
python 中的str 字符串只是某种编码的 bytes 字节序列，而非真正意义上的字符序列。

#####2. unicode

>>> u"中文"
u'\u4e2d\u6587'

>>> type(u"中文")
&lt;type 'unicode'&gt;

>>> len(u"中文")
2

中文的unicode编码，由两个汉字的unicode point 组成，长度为2.

####三、编码转换

#####1. unicode 转换成目标编码

uni_str = u"中文"
utf8_str = uni_str.encode("utf-8")
gbk_str = uni_str.encode("gbk")
gb2312_str = uni_str.encode("gb2312")
utf16_str = uni_str.encode("utf-16")

中文不能转换成 ascii,所以采用英文。

uni_str = u"chinese"
asc_str = uni_str.encode("ascii")

#####2. 其它编码转换成 unicode

us1 = unicode(utf8_str,"utf-8")    
us2 = unicode(gbk_str,"gbk")    
us3 = unicode(gb2312_str,"gb2312")    
us4 = unicode(utf16_str,"utf-16")    
us5 = unicode(asc_str,"ascii")

也可以采用 string.decode的方式，转换成utf8

us1 = "中文".decode("utf-8")
us2 = gbk_str.decode("gbk")

#####3.判断字符串的编码

isinstance(s, str) 用来判断是否为一般字符串
isinstance(s, unicode) 用来判断是否为unicode

四、python 文件的编码

代码文件 # -- coding=utf-8 --或者 #coding=utf-8

◆ sys.setdefaultencoding(“utf-8”) 不一定适应于所有的python 版本？
解决方案

Decode earyly
Unicode everywhere
Encode late

#####1. 读写文件

# coding: UTF-8

f = open('test.txt')
s = f.read()
f.close()
print type(s) # <type 'str'>
# 已知是UTF-8编码，解码成unicode
u = s.decode('UTF-8')

内置的open()方法打开文件时，read()读取的是str，读取后需要使用正确的编码格式进行decode()。

f = open('test.txt', 'w')
# 编码成UTF-8编码的str
s = u.encode('UTF-8')
f.write(s)
f.close()

write()写入时，如果参数是unicode，则需要使用你希望写入的编码进行encode()，如果是其他编码格式的str，则需要先用该str的编码进行decode()，转成unicode后再使用写入的编码进行encode()。

#####2. 根据字符串是否unicode编码，来进行解码

def to_unicode_or_bust(obj, encoding='utf-8'):
     if isinstance(obj, basestring):
         if not isinstance(obj, unicode):
             obj = unicode(obj, encoding)
     return obj

#####3. 使用codecs 进行文件读写操作

import codecs

f = codecs.open('test.txt', encoding='UTF-8')
u = f.read()
f.close()
print type(u) # <type 'unicode'>

f = codecs.open('test.txt', 'a', encoding='UTF-8')
# 写入unicode
f.write(u)

# 写入str，自动进行解码编码操作
# GBK编码的str
s = '汉'
print repr(s) # '\xba\xba'
# 这里会先将GBK编码的str解码为unicode再编码为UTF-8写入
f.write(s) 
f.close()

#####4. BOM文件头

decoding UTF-16 会自动移除BOM，但是UTF-8不会，需要进行 s.decode(‘utf-8-sig’)
也可以使用chardet.detect() 来检查文件编码。

五、Python3

在python3 中 <type ‘str’> 是一个unicode 对象。
separate <type ‘bytes’> type
所有内建模块支持unicode
不再需要 u’text’ 的写法
open() 方法可以像 codecs.open() 一样接受 encoding 编码参数
默认编码是UTF-8.

六、API参考

str(object=’’)
返回以良好格式表示一个对象，对于字符串对象将返回字符串本身。str() 的可读性更好，无法用于eval求值；而 repr(object) 则返回一个可以被 eval() 接受的字符串格式。

unicode(object=’’)
unicode(object[, encoding[, errors]])

str.decode([encoding[, errors]])

str.encode([encoding[, errors]])

basestring()
是 str 和 unicode 的superclass,不能被直接调用或实例化，但可以用来判断一个对象object 是否 str或 unicode 的实例。
isinstance(obj,basestring) 等同于 isinstance(obj,(str,unicode))

isinstance(object, classinfo)
Return true if the object argument is an instance of the classinfo argument,

bytearray([source[, encoding[, errors]]])
bytearray 会返回一个字节数组。bytearray 类型是一个含有 0<= x < 256 数字的可变序列。
source 参数：

如果是字符串，bytearray() 相当于 str.encode()
如果是整数，会被初始化为长度为整数值，bytes为null的数组
如果是个对象并且实现了buffer 接口，一个只读的该对象buffer会被用来初始化该数组
如果是 iterable（可迭代的对象），必须是 0<=x< 256 整数的迭代，用来初始化数组
如果没有参数，会创建一个长度为0的数组

chr(i)
返回一个包含一个该整数所对应的ascii字符的字符串，例如 chr(97) 返回字符串’a’。这个方法是 ord的逆过程，参数必须在[0..255]内，超过这个范围的会产生异常。

unichr(i)
返回该整数值对应的unicode 字符串。

type(object)
返回该object 的类型。也可用isinstance() 测试是否为指定类型的对象.

type(name, bases, dict)
返回一个新的类型对象.
name 对应于 __name 属性； base 对应于 \base 属性； dict 对应于 \dict__ 属性。
示例： X = type(‘X’, (object,), dict(a=1))

####七、编码标准

ISO 8859-1
　　ISO/IEC 8859-1，又称Latin-1或“西欧语言”，是国际标准化组织内ISO/IEC 8859的第一个8位字符集。

####八、tornado 中的编码处理

tornado 对于http header 的编码：

HTTP headers are generally ascii (officially they’re latin1, but use of non-ascii is rare), so we mostly represent them (and data derived from them) with native strings (note that in python2 if a header contains non-ascii data tornado will decode the latin1 and re-encode as utf8!)

也就是说tornado 会把header 中非utf-8的字符全部解码成latin1 (即ISO-8859-1)，再进行utf-8编码。

为了取到正确包含中文的header值(utf-8编码)，需要先进行utf-8解码，再编码成latin1,最后以utf-8 进行解码。如下：

value = header_value.decode('utf-8').encode('latin-1').decode('utf-8')

参考资料

Unicode In Python, Completely Demystified
Python字符编码详解
python 编码问题总结
python str与bytes之间的转换
Guaranteed conversion to unicode or byte string
Built-in Functions - python docs
Standard Encodings - python standard-encodings
Strings and Unicode - tornado中编码的处理
Unicode - Wiki
UTF-8- Wiki
ISO/IEC 8859-1 - Wiki