程序员二十年

Author: Bochun Bai

书越读越薄

代码也一样会越写越短。

刚刚把最近写的代码翻出来重新看了一遍，删掉了大概10%，这个过程还是要经常的做。

微信开放平台Java SDK：https://github.com/sinofool/wechat-java-sdk

支付宝Java SDK：https://github.com/sinofool/alipay-java-sdk

2015-02-03
Protobuf for iOS
https://github.com/sinofool/build-protobuf-ios

在iOS开发中使用Google Protobuf，已经更新到最新iOS SDK 8.1

Build Google Protobuf for iOS development Only build script, please find Google Protobuf here:http://code.google.com/p/protobuf/

Tested on iOS SDK 8.1 on MacOSX 10.8 Tested Google Protobuf 2.6.1
** If older version 2.5.0 is needed, check the repoisitory history and find patch-arm64.patch

Binary output will be on your Desktop and named “protobuf_dist”

Usage
```
curl -O https://protobuf.googlecode.com/files/protobuf-2.6.1.tar.bz2
tar xf protobuf-2.6.1.tar.bz2
cd protobuf-2.6.1
curl https://raw.githubusercontent.com/sinofool/build-protobuf-ios/master/build_protobuf_dist.sh |bash
......
```
2014-12-03
ownCAPTCHA

https://github.com/sinofool/ownCAPTCHA

基于simplecaptcha项目改进的验证码功能。

2014-12-03
OpenSSL for iOS

https://github.com/sinofool/build-openssl-ios

为iOS开发编译libssl.a和libcrypto.a

2014-12-03
DBPool
https://github.com/sinofool/dbpool

这是数据库配置文件的管理系统，主要的功能点如下：
- 集中配置数据库地址、端口、用户名、密码等信息；
- 抽象数据库实例概念
  - 支持读写分离；
  - 支持按规则散列；
- 支持在线修改配置文件，客户端实时更新；
- 支持MySQL数据库，可扩展其他类型；
- 支持Java客户端，可扩展其他语言；
在使用上有一些技巧，参考这里：Practices of using MySQL and DBPool
2014-12-03
What is the best way to learn big data technologies?

Answer by Brent Bai:

I have to say, my career of big data start with "small" data.
You need real big data in hand to understand why these technologies designed.
Most of the big data frameworks are slower than centralised solution when it is about hundreds gigabyte.

Big data is an expensive toy.

What is the best way to learn big data technologies?

2014-10-21
What aspect of Chinese characteristics contributed to its huge population throughout history?

Interesting question and answers

Answer by Andy Lee Chaisiri:

Chinese technology was 1,000+ years ahead of everyone else

Like this, but with horses and rice.

Imagine if today's crops suddenly became 30x more productive, that would cause a population boom, right? Agriculture is how human populations exploded in size compared to hunter-gatherer civilizations. So let's talk about some of those tools of agriculture and how population booms were achieved in an era of horse and plow:

Seed Drill: "What if we planted the seeds under the soil?"

Seed drills are tools that bury seeds at a correct depth in a timely manner. Planting seeds at a good depth increases the chances of an individual seed sprouting, without being eaten by birds. The use of seed drills also allows for planting in nice orderly rows with good spacing so the sprouting plants have enough room to draw nutrients from the soil without mutually starving each other. Not every grain will germinate, but using seed drills to plant crops in rows increases the chances of any individual grain germinating. This allows you to eat more grains because you know only a small quantity is needed to replant fields.

Chinese were using metal multi-tubed seed drills as early as 200BC. Seed drills make an appearance in Europe in 1566AD, about 1700 years after their appearance in China. As for how they were planting seeds before that…

Limbourg Brothers for the Duc de Berry (ca. 1415) 'Les Tres Riches Heures

You had a guy with a bag of seeds planting them by hand, then another guy rakes over the earth to cover them. That method leaves a lot of seeds exposed to be eaten by birds, or are planted too shallow to germinate. The crops that do germinate will be competing with other plants that are growing too close to it, and weeding the fields becomes very difficult, if not a waste of time. Out of the grains you wind up harvesting, a larger amount has to be partitioned for future planting, thus less are eaten.

Compared to this hand planting method, using a seed drill to plant crops in rows is 10x-30x more efficient in terms of how much grain you can harvest vs needing to save them for the next planting.

Iron Mouldboard Plough: "Metal cuts better than wood?"

Imagine a plough. You'd probably think of something made out of metal (perhaps with a wedge) right? Well, plows weren't always like that. The earliest ploughs in human history were basically a plank of wood that you knifed into the ground. Around 300BC, Chinese started using plows that were shaped in a way that they simultaneously cut into the earth and turned it too by 100CE, they were made entirely out of iron. Turning the earth is important for getting more nutrients out of your land, and can even turn 'barren' land fertile.

Around 400AD, a similar mouldboard plow appears in the Roman empire, but widespread adoption is delayed with the fall of the empire. In 1700AD Dutch traders brought Chinese iron mouldboard plows back to Europe, and an agricultural revolution soon followed. Now, what was plowing like without an iron mouldboard plough?

A painting from the 16th century showing a farmer at work, by Pieter Brugel

That is a piece of wood being used to slice into the ground. Because that wooden plough doesn't have a mouldboard the cut soil needs to be tilled through further labor. Iron was expensive and labor intensive to produce, so at best you would have a thin sheet of iron covering the edge of your mostly wooden plough.

So, why did Chinese have all of these iron agricultural tools centuries earlier than Europeans? Because their methods of iron (& steel) production were also centuries ahead.

Blast Furnace: "Like baking a sponge cake made of iron"

The Iron Age is considered to have begun around 1700-1500BC. To extract iron from an ore of iron oxide, the iron has to be separated from oxygen and other impurities in a high temperature process which takes carbon to extract the oxygen out of the ore as carbon dioxide. This is called 'smelting'.

The earliest smelting of iron ore was done at temperatures below the melting point of iron. This left a spongy mass of iron that needed to be shaped by hammering, a very work intensive process.

But some time around 600BC, Chinese developed a furnace that could create a heat intense enough to melt iron, the blast furnace. Once liquified, iron could be poured into casts already in the shape of tools that were needed. The iron casting industry was officially supported by dynastic governments, leading to widespread adoption of iron tools made to a standard.

Now a special note about the difference between iron and steel. Cast iron is very high in carbon content, making it hard but brittle. Steel is iron that has a perfect balance of carbon to retain an edge but also maintain just enough flexibility to avoid brittleness. Around 200BC, Chinese learned that if air was blown over iron as it was being cast the carbon content could be reduced and what you wound up with was steel. Around 600AD steel tools began to widely replace iron ones.

The earliest evidence of blast furnaces in Europe is 1100AD, with widespread adoption occuring in 1400AD. The process of creating steel I described above first appears in the western world in 1855, and there's some contention that the 'inventor' may have actually gotten the idea from Chinese workers in the US.

As another illustration of the difference in iron production, by 1078AD the foundries of northern China could produce 114,000 tons of iron a year. In 1788AD, England produced about 50,000 tons of iron.

Horse Collar: "Over 1,000 years of choking horses"

Imagine a horse pulling a plough. Now, how did you imagine that plough being attached to the horse, with a horse collar, right? Unfortunately for horses, before the collar was invented there was the throat girth harness, which sounds as awful as it is. A plough (or any other load) attached by a throat-girth harness means that a horse is basically pulling with a noose around his trachea. Around 300BC, someone in China thought "What if the horse pulled with its chest instead of its throat?" and so the breast-strap harness was born and horses across China breathed a sigh of relief. This was improved on in 500AD with the horse collar as we know it.

The breast strap harness appears in Russia in 700AD, and shows up further west in Norway around 800AD. The horse collar appears a bit later in 900AD, with widespread adoption by 1200AD.

The difference between China and Europe's population levels throughout history is the difference between their agricultural technology. China had time saving, force multiplying tools (that didn't strangle horses) for centuries, even millenia before adoption in Europe.

What aspect of Chinese characteristics contributed to its huge population throughout history?

2014-10-15
国庆

1999年国庆50年，我在广场跳集体舞。
2004年国庆55年，我在广场组织跳集体舞。
2009年国庆60年，我觉得整十年的纪念币很珍贵。
2014年国庆65年，我在加拿大看香港占中的新闻。
2019年我会在哪？

2014-10-02
How can I plan to become a billionaire in 2 years?

I know some of you would laugh at me but I know myself that i have the instinct to become a billionaire. I am a freelance web developer working on a social network start up. How do I plan that? the timeline, project duration,etc…;-)

View Question on Quora

2014-06-27
还要不要做大数据

我5月14日发了一条微博，后面的评论和私聊引起了我很多反思。这条微博是这样的：“我从0.16版本开始用Hadoop到现在已经5年了，一直相信大数据会是未来决胜的关键。但是，这个未来看来还有很远。或者说我们遇到的问题不同，是这些发明大数据的老外难以理解的，我们已经超前了很多。”

大数据这个话题近些年越来越热，我其实觉得它热过了头，所以我这里是想泼冷水的。

历史

估计会看到这个内容的大多数是我以前的同事，你们都知道我进这行不晚，2008年在人人网开始搭Hadoop，用0.16.3，那时候还没有什么人谈大数据，第一本Hadoop的书[1]也是2009年6月才出版的。那时候我们也没有概念要怎么用这个东西，唯一的目的就是改变“打点统计Log”模式，一开始就把生产服务搞死了三次。

那时候的服务叫ActiveLog，每一个PV记录一行，格式跟Apache Combined Log很类似，我们把WebServer的日志集中记录在统一的Server上（是的，比Facebook开源Scribe早半年[2]）。为了存储空间的问题，引入了Hadoop，分布的存储在几百台服务器上。也就是这个结构，运行MapReduce占交换机带宽过大就会把生产集群挤死。

我记得最早的一个完整24小时Log文件的日期时2008年3月15日。那时候的日志是196GB/天。

当然后来大数据火了，我们有了更多的内部用户，也有了独立服务器甚至独立的机房，千兆直连核心交换机，到我2012年离职时集群已经有700台的规模了。

反思

饮水思源，这些年大数据概念红火带来了项目的红利，受这个影响我自己职业发展也不错。但是，掩盖不了一个一个具体问题的产生，应用范围一直是我最困扰的难题。

这让我回到源头去重新审视自己设计的系统和整个应用体系，然后我才发了最开始的那条微博。

数据量

多大的数据量敢叫大数据呢？Wikipedia里面有一句话：As of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set.[3]

2008年人人网的Log数据一个月有6TB，将就着算half dozen吧，偶尔也要算整年的数据。

所以我只敢说我做过Hadoop，实在不敢说成大数据。现在在谈大数据的书和文章，有多少作者是处理过上PB数据的？国内PB级容量的集群又有几个呢？

几百G就用awk吧，几T其实也可以用数据库的。

应用范围

谈到大数据应用就涉及三件事：1) Distributed/Parallel computing. 2) Data mining 3) Business Intelligence

这三个是互相依赖的，直接的需求来自BI，间接的需求来自数据挖掘，实现在Computing上。可是现实的情况给我的切身感受用一句老话来比喻：粗放型经济向集约型经济转型。现在谈集约的下一步绿色经济，还为时尚早。

我们的互联网有几乎取之不尽的用户，打擦边球都能上市的公司，我们真的在乎数据吗？

炒作完大数据概念，真的应用到业务里，产生了利润吗？能挣回成本吗？

我知道国内大多数互联网公司的PM是不用数据做决策的，在谈大数据之前，应该从“小数据”开始。

这个切身体会我是到国外工作以后才有的，发那条微博前一周，我转了大概8%的现金到另一家银行开户，第二天，我的客户经理就要约我谈谈“投资需求”。要知道我去招行销金葵花可都没人问原因，销户一个月我的客户经理还打电话跟我说“因为我是金葵花客户，所以邀请办百夫长黑卡”，这是多么大的差距。

但是这还是“小数据”，这些事情还没办法做好，国内的大数据怎么做，做出来给谁看，谁又真的会看？

其实我觉得这个问题是无解的，市场决定了这个粗放的大环境，短期内是不会改变的。

现实能做的，不是去贩卖大数据的概念和技术，而是实实在在的让“小数据”先得到应用。
[1] http://wiki.apache.org/hadoop/Books
[2] http://en.wikipedia.org/wiki/Scribe_(log_server)
[3] http://en.wikipedia.org/wiki/Big_data

2014-05-21