SSLCertificateChainFile

我使用StartCom提供的免费SSL证书已经好几年了，今天偶然用Firefox打开发现报错。

查了一圈，还是在stackoverflow找到了答案，添加SSLCertificateChainFile这个参数以后就恢复正常了。

期间发现一个SiteCheck工具，提供了证书过期的重要信息给我。

Practices of using MySQL and DBPool

给公司写的文档草稿，稍后会写成中文的。

Summary

In large cluster environment, it is always challenge of manage hundreds MySQL databases.

Planed down time of database server maintenance
Scalability
Change of data structures

Ideally, these all handled by proper development process and you have enough software engineers to support.
But in real world, it is more urgent of solving problems.

Here is the 9 best practices to operate thousands of app servers and databases and we don’t have any “user impact” because of database maintenance.

Database design:

1 Global design

We design our database schema like a one-node-cluster, we can run all the service in one box or 1000 boxes.
It is transparent to software engineers.

2 Simple only

Use only basic MySQL features: table, primary key, index, replication.

Application design:

3 Use app server’s CPU.

App servers are scalable, but MySQL is a bottle neck, Use as much as application CPU, which means:
Do not query without index
Do not sort using database
Do not use any query causes temporary tables

4 Use a abstraction layer of tables

We use DBPool for a long time

Ops workflow:

5 Vertical partitioning

When you want to move a few tables to a different master.
a. Setup new master (B) as the slave of the old master (A)
b. Change the DBPool configuration, point master to B.
c. In the middle of the time, (B) will also receive some update requests from old client.
d. Make sure it is no any queries to tables on B
e. Stop the replication and optionally drop old tables on A.

6 Horizontal shading

When you want to distribute data of one table into more physic servers.
a. Estimate how many is needed, find a proper shading key. You should visit only one instance after shading.
b. A good shading number may be 10 or 100. It is human friendly when debug.
c. You don’t need to have 100 physic servers to deploy all tables, DBPool have the ability of route.
d. Use the same method to move tables to new master as showed in (5)

7 Change of data structures

a. We do add column only, no drop column.
b. Application level compatible is required. Make sure new code is working with both old and new data. (if impossible, see (8))
c. Make the changes
d. Update the application to use new column for new feature.

8 Data migration

This situation always involves a big change to the logic, you need to redesign the structure
a. Create a new master (B) of tables using (5)
b. Create new data structures on (B).
c. Create a trigger to update new structure when old data changed on (B).
d. Migrate your old data into the new structure, pay attention to (c) have already moved some recent data.
e. Create a new abstract instance in DBPool, for new structures.
f. Update the application use the new structure for reading.
g. In the same time, old client and new client have the same data for we have (c).
h. Update the application use the new structure for writing.
i. Stop the replication (a), trigger (c) and drop the old tables

9 Planned maintenance

a. DBPool is enough to move MySQL slave servers.
b. Use (5) to make a new master or promote one slave to master.

OpenSSL and cURL for iOS

昨天升级iOS程序，顺便升级了依赖到的两个库，OpenSSL和cURL。
升级了版本，增加了新的iPhone5s的64位CPU的支持。

I have updated the dependency libraries of my iPhone app, OpenSSL and cURL.
Added support of iPhone5s new 64bit arm64 CPU.
Upgraded to latest version.

Approach

交叉编译这两个库的关键是两个参数：-isysroot和-miphoneos-version-min
cURL是拼上不同的参数实现的，OpenSSL已经内建有iphoneos-cross支持，修改一些参数来支持arm64和模拟器。

The key of cross compiling are two parameters: -isysroot and -miphoneos-version-min
cURL is configured using parameters.
While OpenSSL has a build-in target called iphoneos-cross, I added 64 bit support based on it.

Code

这两个项目的代码提交在了GitHub,目前用iOS SDK7.1在MacOSX 10.8验证通过.
Here are two projects on GitHub, tested on iOS SDK 7.1 and MacOSX 10.8 .
https://github.com/sinofool/build-openssl-ios
https://github.com/sinofool/build-libcurl-ios

Usage

这两个脚本不需要git clone再使用，下载好OpenSSL和cURL的源代码并解压缩，直接运行github上的脚本，就会编译好放在桌面上。
It is not necessary clone the code locally, download the sources from OpenSSL and cURL official website.
Run following scripts, results will be on the desktop.

curl -O http://www.openssl.org/source/openssl-1.0.1f.tar.gz
tar xf openssl-1.0.1f.tar.gz
cd openssl-1.0.1f
curl https://raw.githubusercontent.com/sinofool/build-openssl-ios/master/build_openssl_dist.sh |bash

cURL也是一样：

curl -O http://curl.haxx.se/download/curl-7.35.0.tar.gz
tar xf curl-7.35.0.tar.gz
cd curl-7.35.0
curl https://raw.githubusercontent.com/sinofool/build-libcurl-ios/master/build_libcurl_dist.sh |bash

new Life(location)

有好久没有更新Blog了，在这一年忙着工作，忙着结婚，就在上个月到了加拿大开始全新的生活。

这里的工作文档写的很多，所以有更多可能把工作中可以公开的部分，写成文档放在这里分享。

最近正在修改simplecaptcha（http://simplecaptcha.sourceforge.net/）项目，增加和整理一些功能，修改后的代码发布在GitHub上叫ownCAPTCHA（https://github.com/sinofool/ownCAPTCHA）。
下一个版本完成的时候，会写一些例子在这里。

iCoupon黑名单

我从2013年5月27日起，连续收到通过iMessage的垃圾短信。垃圾也就罢了，也不是一天两天了，但是这个iCoupon妄图洗白自己，所谓退订链接甚至要留下手机号码，还谎称系统故障骗人退订留下更多信息。
因为暂时没有办法屏蔽，更不想越狱引入更大的流氓。唯一的办法就是抵制这些手段宣传过的产品了。

以下是通过iCoupon推广过的各种品牌，罗列在此，列入黑名单。

2013年5月
甜风集，蛋糕；
蓝色动力，汽车养护；

2013年6月
臆蜜坊，丰胸；
吾爱吾庐连锁公寓，酒店；
狗屎咖啡，咖啡；

2013年7月
火宫殿，餐饮；
DHC，化妆品；
世纪奥桥花卉园艺超市，花卉；
羽丹纤美，瘦身；
皙荷美道，丰胸；

******持续更新******

6月17日改用AmazonRoute53解析本域名

记录备下周对比

Build Google protobuf 2.4.1 for iOS development

Although iOS5 shipped with a private version of protobuf but it is too old for me.
Here is the script I used to build protobuf.
1. Download the newest protobuf(2.4.1 at the moment) and unpack.
2. Run this in the unpacked directory. It will create a folder named “protobuf_dist” on desktop.
3. Copy or add protobuf_dist to xcode project. That’s all.
#!/bin/bash


TMP_DIR=/tmp/protobuf_$$
###################################################

# Build i386 version first,

# Because arm needs it binary.

###################################################
CFLAGS=-m32 CPPFLAGS=-m32 CXXFLAGS=-m32 LDFLAGS=-m32 ./configure --prefix=${TMP_DIR}/i386 \

	--disable-shared \

	--enable-static || exit 1

make clean || exit 2

make -j8 || exit 3

make install || exit 4
###################################################

# Build armv7 version,

###################################################
SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS5.1.sdk

DEVROOT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer
export CC=${DEVROOT}/usr/bin/llvm-gcc

export CFLAGS="-arch armv7 -isysroot $SDKROOT"
export CXX=${DEVROOT}/usr/bin/llvm-g++

export CXXFLAGS="$CFLAGS"

export LDFLAGS="-isysroot $SDKROOT -Wl,-syslibroot $SDKROOT"
./configure --prefix=$TMP_DIR/armv7 \

	--with-protoc=${TMP_DIR}/i386/bin/protoc \

	--disable-shared \

	--enable-static \

	-host=arm-apple-darwin10 || exit 1

make clean || exit 2

make -j8 || exit 3

make install || exit 4
###################################################

# Packing

###################################################

DIST_DIR=$HOME/Desktop/protobuf_dist rm -rf ${DIST_DIR} mkdir -p ${DIST_DIR} mkdir ${DIST_DIR}/{bin,lib} cp -r ${TMP_DIR}/armv7/include ${DIST_DIR}/ cp ${TMP_DIR}/i386/bin/protoc ${DIST_DIR}/bin/ lipo -arch i386 ${TMP_DIR}/i386/lib/libprotobuf.a -arch armv7 ${TMP_DIR}/armv7/lib/libprotobuf.a -output ${DIST_DIR}/lib/libprotobuf.a -create

This is tested on OSX Lion with Xcode 4.2.1

使用Hive做数据分析

在大规模推广streaming方式的数据分析后，我们发现这个模式虽然入门成本低，但是执行效率也一样低。
每一个map task都要在TaskTracker上启动两个进程，一个java和一个perl/bash/python。
输入输出都多复制一次。

经过了一系列调研后，我们开始将部分streaming任务改写为Hive。

Hive是什么？

Hive是单机运行的SQL解析引擎，本身并不运行在Hadoop上。
SQL经过Hive解析为MapReduce任务，在Hadoop上运行。
使用Hive可以降低沟通成本，因为SQL语法的普及度较高。
Hive翻译的任务效率不错，但是依然不如优化过的纯MapReduce任务。

数据准备

原始日志文件是这样的：
1323431269786 202911262 RE_223500512 AT_BLOG_788514510 REPLY BLOG_788514510_202911262

分别对应的字段是 <时间> <操作人> [[说明] [说明]……] <操作> <实体>
上面的例子对应的含义是：

<时间>： 1323431269786
<操作人>： 202911262
[说明]： RE_223500512
[说明]： AT_BLOG_788514510
<操作>： REPLY
<实体>： BLOG_788514510_202911262

扩展Hive的Deserializer

要用SQL分析数据，Hive必须知道如何切分整行的日志。Hive提供了一个接口，留给我们扩展自己的序列化和反序列化方法。


import java.util.Properties;
import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hive.serde2.Deserializer;

import org.apache.hadoop.hive.serde2.SerDeException;

import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;

import org.apache.hadoop.io.Writable;
public class RawActionDeserializer implements Deserializer {
  @Override

  public Object deserialize(Writable obj) throws SerDeException {

    // TODO Auto-generated method stub

    return null;

  }
  @Override

  public ObjectInspector getObjectInspector() throws SerDeException {

    // TODO Auto-generated method stub

    return null;

  }
  @Override

  public void initialize(Configuration conf, Properties props)

      throws SerDeException {

    // TODO Auto-generated method stub
  }

}
三个函数作用分别是：

initialize：在启动时调用，根据运行时参数调整行为或者分配资源。
getObjectInspector：返回字段定义名称和类型。
deserialize：对每一行数据进行反序列化，返回结果。

定义表结构

在我们这个例子中，字段是固定的含义，不需要在initialize方法配置运行期参数。我们把字段的定义写成static，如下。
private static List structFieldNames = new ArrayList();


  private static List structFieldObjectInspectors = new ArrayList();

  static {

    structFieldNames.add("time");

    structFieldObjectInspectors.add(ObjectInspectorFactory

        .getReflectionObjectInspector(Long.TYPE, ObjectInspectorOptions.JAVA));
    structFieldNames.add("id");

    structFieldObjectInspectors.add(ObjectInspectorFactory

        .getReflectionObjectInspector(

            java.lang.Integer.TYPE, ObjectInspectorOptions.JAVA));
    structFieldNames.add("adv");

    structFieldObjectInspectors.add(ObjectInspectorFactory

        .getStandardListObjectInspector(

            ObjectInspectorFactory.getReflectionObjectInspector(

                String.class, ObjectInspectorOptions.JAVA)));
    structFieldNames.add("verb");

    structFieldObjectInspectors

        .add(ObjectInspectorFactory.getReflectionObjectInspector(

            String.class, ObjectInspectorOptions.JAVA));
    structFieldNames.add("obj");

    structFieldObjectInspectors

        .add(ObjectInspectorFactory.getReflectionObjectInspector(

            String.class, ObjectInspectorOptions.JAVA));

  }

@Override public ObjectInspector getObjectInspector() throws SerDeException { return ObjectInspectorFactory.getStandardStructObjectInspector( structFieldNames, structFieldObjectInspectors); }

定义解析函数

为了能够让Java MapReduce任务复用代码，我们在外部实现了一个与Hive无关的类，这里不再贴代码。这个类定义了与日志字段相同的成员变量，并且提供一个static的valueOf方法用于从字符串构造自己。
@Override public Object deserialize(Writable blob) throws SerDeException { if (blob instanceof Text) { String line = ((Text) blob).toString(); RawAction act = RawAction.valueOf(line); List