{"id":119,"date":"2011-12-12T14:57:17","date_gmt":"2011-12-12T06:57:17","guid":{"rendered":"http:\/\/sinofool.com\/blog\/?p=119"},"modified":"2011-12-12T15:07:00","modified_gmt":"2011-12-12T07:07:00","slug":"%e4%bd%bf%e7%94%a8hive%e5%81%9a%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90","status":"publish","type":"post","link":"https:\/\/sinofool.net\/blog\/archives\/119","title":{"rendered":"\u4f7f\u7528Hive\u505a\u6570\u636e\u5206\u6790"},"content":{"rendered":"<p>\u5728\u5927\u89c4\u6a21\u63a8\u5e7fstreaming\u65b9\u5f0f\u7684\u6570\u636e\u5206\u6790\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6a21\u5f0f\u867d\u7136\u5165\u95e8\u6210\u672c\u4f4e\uff0c\u4f46\u662f\u6267\u884c\u6548\u7387\u4e5f\u4e00\u6837\u4f4e\u3002<br \/>\n\u6bcf\u4e00\u4e2amap task\u90fd\u8981\u5728TaskTracker\u4e0a\u542f\u52a8\u4e24\u4e2a\u8fdb\u7a0b\uff0c\u4e00\u4e2ajava\u548c\u4e00\u4e2aperl\/bash\/python\u3002<br \/>\n\u8f93\u5165\u8f93\u51fa\u90fd\u591a\u590d\u5236\u4e00\u6b21\u3002<\/p>\n<p>\u7ecf\u8fc7\u4e86\u4e00\u7cfb\u5217\u8c03\u7814\u540e\uff0c\u6211\u4eec\u5f00\u59cb\u5c06\u90e8\u5206streaming\u4efb\u52a1\u6539\u5199\u4e3aHive\u3002<\/p>\n<h1>Hive\u662f\u4ec0\u4e48\uff1f<\/h1>\n<ol>\n<li>Hive\u662f\u5355\u673a\u8fd0\u884c\u7684SQL\u89e3\u6790\u5f15\u64ce\uff0c\u672c\u8eab\u5e76\u4e0d\u8fd0\u884c\u5728Hadoop\u4e0a\u3002<\/li>\n<li>SQL\u7ecf\u8fc7Hive\u89e3\u6790\u4e3aMapReduce\u4efb\u52a1\uff0c\u5728Hadoop\u4e0a\u8fd0\u884c\u3002<\/li>\n<li>\u4f7f\u7528Hive\u53ef\u4ee5\u964d\u4f4e\u6c9f\u901a\u6210\u672c\uff0c\u56e0\u4e3aSQL\u8bed\u6cd5\u7684\u666e\u53ca\u5ea6\u8f83\u9ad8\u3002<\/li>\n<li>Hive\u7ffb\u8bd1\u7684\u4efb\u52a1\u6548\u7387\u4e0d\u9519\uff0c\u4f46\u662f\u4f9d\u7136\u4e0d\u5982\u4f18\u5316\u8fc7\u7684\u7eafMapReduce\u4efb\u52a1\u3002<\/li>\n<\/ol>\n<h1>\u6570\u636e\u51c6\u5907<\/h1>\n<p>\u539f\u59cb\u65e5\u5fd7\u6587\u4ef6\u662f\u8fd9\u6837\u7684\uff1a<br \/>\n<code>1323431269786 202911262 RE_223500512 AT_BLOG_788514510 REPLY BLOG_788514510_202911262<\/code><\/p>\n<div>\u5206\u522b\u5bf9\u5e94\u7684\u5b57\u6bb5\u662f &lt;\u65f6\u95f4&gt; &lt;\u64cd\u4f5c\u4eba&gt; [[\u8bf4\u660e] [\u8bf4\u660e]&#8230;&#8230;] &lt;\u64cd\u4f5c&gt; &lt;\u5b9e\u4f53&gt;<br \/>\n\u4e0a\u9762\u7684\u4f8b\u5b50\u5bf9\u5e94\u7684\u542b\u4e49\u662f\uff1a<\/div>\n<div>\n<ul>\n<li>&lt;\u65f6\u95f4&gt;\uff1a 1323431269786<\/li>\n<li>&lt;\u64cd\u4f5c\u4eba&gt;\uff1a 202911262<\/li>\n<li>[\u8bf4\u660e]\uff1a RE_223500512<\/li>\n<li>[\u8bf4\u660e]\uff1a AT_BLOG_788514510<\/li>\n<li>&lt;\u64cd\u4f5c&gt;\uff1a REPLY<\/li>\n<li>&lt;\u5b9e\u4f53&gt;\uff1a BLOG_788514510_202911262<\/li>\n<\/ul>\n<\/div>\n<h1>\u6269\u5c55Hive\u7684Deserializer<\/h1>\n<p>\u8981\u7528SQL\u5206\u6790\u6570\u636e\uff0cHive\u5fc5\u987b\u77e5\u9053\u5982\u4f55\u5207\u5206\u6574\u884c\u7684\u65e5\u5fd7\u3002Hive\u63d0\u4f9b\u4e86\u4e00\u4e2a\u63a5\u53e3\uff0c\u7559\u7ed9\u6211\u4eec\u6269\u5c55\u81ea\u5df1\u7684\u5e8f\u5217\u5316\u548c\u53cd\u5e8f\u5217\u5316\u65b9\u6cd5\u3002<br \/>\n<code lang=\"java\"><\/p>\n<p>import java.util.Properties;<\/p>\n<p>import org.apache.hadoop.conf.Configuration;<br \/>\nimport org.apache.hadoop.hive.serde2.Deserializer;<br \/>\nimport org.apache.hadoop.hive.serde2.SerDeException;<br \/>\nimport org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;<br \/>\nimport org.apache.hadoop.io.Writable;<\/p>\n<p>public class RawActionDeserializer implements Deserializer {<\/p>\n<p>  @Override<br \/>\n  public Object deserialize(Writable obj) throws SerDeException {<br \/>\n    \/\/ TODO Auto-generated method stub<br \/>\n    return null;<br \/>\n  }<\/p>\n<p>  @Override<br \/>\n  public ObjectInspector getObjectInspector() throws SerDeException {<br \/>\n    \/\/ TODO Auto-generated method stub<br \/>\n    return null;<br \/>\n  }<\/p>\n<p>  @Override<br \/>\n  public void initialize(Configuration conf, Properties props)<br \/>\n      throws SerDeException {<br \/>\n    \/\/ TODO Auto-generated method stub<\/p>\n<p>  }<\/p>\n<p>}<br \/>\n<\/code><br \/>\n\u4e09\u4e2a\u51fd\u6570\u4f5c\u7528\u5206\u522b\u662f\uff1a<\/p>\n<ul>\n<li>initialize\uff1a\u5728\u542f\u52a8\u65f6\u8c03\u7528\uff0c\u6839\u636e\u8fd0\u884c\u65f6\u53c2\u6570\u8c03\u6574\u884c\u4e3a\u6216\u8005\u5206\u914d\u8d44\u6e90\u3002<\/li>\n<li>getObjectInspector\uff1a\u8fd4\u56de\u5b57\u6bb5\u5b9a\u4e49\u540d\u79f0\u548c\u7c7b\u578b\u3002<\/li>\n<li>deserialize\uff1a\u5bf9\u6bcf\u4e00\u884c\u6570\u636e\u8fdb\u884c\u53cd\u5e8f\u5217\u5316\uff0c\u8fd4\u56de\u7ed3\u679c\u3002<\/li>\n<\/ul>\n<h1>\u5b9a\u4e49\u8868\u7ed3\u6784<\/h1>\n<p>\u5728\u6211\u4eec\u8fd9\u4e2a\u4f8b\u5b50\u4e2d\uff0c\u5b57\u6bb5\u662f\u56fa\u5b9a\u7684\u542b\u4e49\uff0c\u4e0d\u9700\u8981\u5728initialize\u65b9\u6cd5\u914d\u7f6e\u8fd0\u884c\u671f\u53c2\u6570\u3002\u6211\u4eec\u628a\u5b57\u6bb5\u7684\u5b9a\u4e49\u5199\u6210static\uff0c\u5982\u4e0b\u3002<br \/>\n<code lang=\"java\"><br \/>\n private static List<String> structFieldNames = new ArrayList<String>();<\/p>\n<p>  private static List<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();<br \/>\n  static {<br \/>\n    structFieldNames.add(\"time\");<br \/>\n    structFieldObjectInspectors.add(ObjectInspectorFactory<br \/>\n        .getReflectionObjectInspector(Long.TYPE, ObjectInspectorOptions.JAVA));<\/p>\n<p>    structFieldNames.add(\"id\");<br \/>\n    structFieldObjectInspectors.add(ObjectInspectorFactory<br \/>\n        .getReflectionObjectInspector(<br \/>\n            java.lang.Integer.TYPE, ObjectInspectorOptions.JAVA));<\/p>\n<p>    structFieldNames.add(\"adv\");<br \/>\n    structFieldObjectInspectors.add(ObjectInspectorFactory<br \/>\n        .getStandardListObjectInspector(<br \/>\n            ObjectInspectorFactory.getReflectionObjectInspector(<br \/>\n                String.class, ObjectInspectorOptions.JAVA)));<\/p>\n<p>    structFieldNames.add(\"verb\");<br \/>\n    structFieldObjectInspectors<br \/>\n        .add(ObjectInspectorFactory.getReflectionObjectInspector(<br \/>\n            String.class, ObjectInspectorOptions.JAVA));<\/p>\n<p>    structFieldNames.add(\"obj\");<br \/>\n    structFieldObjectInspectors<br \/>\n        .add(ObjectInspectorFactory.getReflectionObjectInspector(<br \/>\n            String.class, ObjectInspectorOptions.JAVA));<br \/>\n  }<\/p>\n<p>  @Override<br \/>\n  public ObjectInspector getObjectInspector() throws SerDeException {<br \/>\n    return ObjectInspectorFactory.getStandardStructObjectInspector(<br \/>\n        structFieldNames, structFieldObjectInspectors);<br \/>\n  }<br \/>\n<\/code><\/p>\n<h1>\u5b9a\u4e49\u89e3\u6790\u51fd\u6570<\/h1>\n<p>\u4e3a\u4e86\u80fd\u591f\u8ba9Java MapReduce\u4efb\u52a1\u590d\u7528\u4ee3\u7801\uff0c\u6211\u4eec\u5728\u5916\u90e8\u5b9e\u73b0\u4e86\u4e00\u4e2a\u4e0eHive\u65e0\u5173\u7684\u7c7b\uff0c\u8fd9\u91cc\u4e0d\u518d\u8d34\u4ee3\u7801\u3002\u8fd9\u4e2a\u7c7b\u5b9a\u4e49\u4e86\u4e0e\u65e5\u5fd7\u5b57\u6bb5\u76f8\u540c\u7684\u6210\u5458\u53d8\u91cf\uff0c\u5e76\u4e14\u63d0\u4f9b\u4e00\u4e2astatic\u7684valueOf\u65b9\u6cd5\u7528\u4e8e\u4ece\u5b57\u7b26\u4e32\u6784\u9020\u81ea\u5df1\u3002<br \/>\n<code lang=\"java\"><br \/>\n@Override<br \/>\npublic Object deserialize(Writable blob) throws SerDeException {<br \/>\n  if (blob instanceof Text) {<br \/>\n    String line = ((Text) blob).toString();<br \/>\n    RawAction act = RawAction.valueOf(line);<br \/>\n    List<Object> result = new ArrayList<Object>();<br \/>\n    if (act == null)<br \/>\n      return null;<br \/>\n    result.add(act.getTime());<br \/>\n    result.add(act.getUserId());<br \/>\n    result.add(act.getAdv());<br \/>\n    result.add(act.getVerb());<br \/>\n    result.add(act.getObj());<br \/>\n    return result;<br \/>\n  }<br \/>\n  return null;<br \/>\n}<br \/>\n<\/code><br \/>\n\u5efa\u8868<\/p>\n<p>\u628a\u4e0a\u9762\u7a0b\u5e8f\u7f16\u8bd1\u5e76\u4f20\u5230hive\u90e8\u7f72\u76ee\u5f55\u540e\uff0c\u8fdb\u5165hive\uff1a<br \/>\n<code lang=\"bash\"><br \/>\n$ .\/hive --auxpath \/home\/bochun.bai\/dp-base-1.0-SNAPSHOT.jar<br \/>\n<\/code><br \/>\n<code lang=\"sql\"><br \/>\nhive> CREATE TABLE ac_raw ROW FORMAT SERDE 'com.renren.dp.hive.RawActionDeserializer';<br \/>\nOK<br \/>\nTime taken: 0.117 seconds<br \/>\nhive> DESC ac_raw;<br \/>\nOK<br \/>\ntime\tbigint\tfrom deserializer<br \/>\nid\tint\tfrom deserializer<br \/>\nadv\tarray<string>\tfrom deserializer<br \/>\nverb\tstring\tfrom deserializer<br \/>\nobj\tstring\tfrom deserializer<br \/>\nTime taken: 0.145 seconds<br \/>\n<\/code><br \/>\n<code lang=\"sql\"><br \/>\nhive> LOAD DATA INPATH '\/user\/bochun.bai\/hivedemo\/raw_action' OVERWRITE INTO TABLE ac_raw;<br \/>\nLoading data to table default.ac_raw<br \/>\nDeleted hdfs:\/\/NAMENODE\/user\/bochun.bai\/warehouse\/ac_raw<br \/>\nOK<br \/>\nTime taken: 0.173 seconds<br \/>\n<\/code><br \/>\n<code lang=\"sql\"><br \/>\nhive> SELECT count(1) FROM ac_raw;<br \/>\n\u2026...\u663e\u793a\u5f88\u591aMapReduce\u8fdb\u5ea6\u4e4b\u540e......<br \/>\nOK<br \/>\n332<br \/>\nTime taken: 15.404 seconds<br \/>\n<\/code><br \/>\n<code lang=\"sql\"><br \/>\nhive> SELECT count(1) as cnt, verb FROM ac_raw GROUP BY verb;<br \/>\n\u2026...\u663e\u793a\u5f88\u591aMapReduce\u8fdb\u5ea6\u4e4b\u540e......<br \/>\nOK<br \/>\n4\tADD_FOOTPRINT<br \/>\n1\tREPLY<br \/>\n24\tSHARE_BLOG<br \/>\n299\tVISIT<br \/>\n4\tadd_like<br \/>\nTime taken: 15.242 seconds<br \/>\n<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5728\u5927\u89c4\u6a21\u63a8\u5e7fstreaming\u65b9\u5f0f\u7684\u6570\u636e\u5206\u6790\u540e\uff0c\u6211\u4eec\u53d1\u73b0\u8fd9\u4e2a\u6a21\u5f0f\u867d\u7136\u5165\u95e8\u6210\u672c\u4f4e\uff0c\u4f46\u662f\u6267\u884c\u6548\u7387\u4e5f\u4e00\u6837\u4f4e\u3002 \u6bcf\u4e00\u4e2amap task\u90fd\u8981\u5728TaskTracker\u4e0a\u542f\u52a8\u4e24\u4e2a\u8fdb\u7a0b\uff0c\u4e00\u4e2ajava\u548c\u4e00\u4e2aperl\/bash\/python\u3002 \u8f93\u5165\u8f93\u51fa\u90fd\u591a\u590d\u5236\u4e00\u6b21\u3002 \u7ecf\u8fc7\u4e86\u4e00\u7cfb\u5217\u8c03\u7814\u540e\uff0c\u6211\u4eec\u5f00\u59cb\u5c06\u90e8\u5206streaming\u4efb\u52a1\u6539\u5199\u4e3aHive\u3002 Hive\u662f\u4ec0\u4e48\uff1f Hive\u662f\u5355\u673a\u8fd0\u884c\u7684SQL\u89e3\u6790\u5f15\u64ce\uff0c\u672c\u8eab\u5e76\u4e0d\u8fd0\u884c\u5728Hadoop\u4e0a\u3002 SQL\u7ecf\u8fc7Hive\u89e3\u6790\u4e3aMapReduce\u4efb\u52a1\uff0c\u5728Hadoop\u4e0a\u8fd0\u884c\u3002 \u4f7f\u7528Hive\u53ef\u4ee5\u964d\u4f4e\u6c9f\u901a\u6210\u672c\uff0c\u56e0\u4e3aSQL\u8bed\u6cd5\u7684\u666e\u53ca\u5ea6\u8f83\u9ad8\u3002 Hive\u7ffb\u8bd1\u7684\u4efb\u52a1\u6548\u7387\u4e0d\u9519\uff0c\u4f46\u662f\u4f9d\u7136\u4e0d\u5982\u4f18\u5316\u8fc7\u7684\u7eafMapReduce\u4efb\u52a1\u3002 \u6570\u636e\u51c6\u5907 \u539f\u59cb\u65e5\u5fd7\u6587\u4ef6\u662f\u8fd9\u6837\u7684\uff1a 1323431269786 202911262 RE_223500512 AT_BLOG_788514510 REPLY BLOG_788514510_202911262 \u5206\u522b\u5bf9\u5e94\u7684\u5b57\u6bb5\u662f &lt;\u65f6\u95f4&gt; &lt;\u64cd\u4f5c\u4eba&gt; [[\u8bf4\u660e] [\u8bf4\u660e]&#8230;&#8230;] &lt;\u64cd\u4f5c&gt; &lt;\u5b9e\u4f53&gt; \u4e0a\u9762\u7684\u4f8b\u5b50\u5bf9\u5e94\u7684\u542b\u4e49\u662f\uff1a &lt;\u65f6\u95f4&gt;\uff1a 1323431269786 &lt;\u64cd\u4f5c\u4eba&gt;\uff1a 202911262 [\u8bf4\u660e]\uff1a RE_223500512 [\u8bf4\u660e]\uff1a AT_BLOG_788514510 &lt;\u64cd\u4f5c&gt;\uff1a REPLY &lt;\u5b9e\u4f53&gt;\uff1a BLOG_788514510_202911262 \u6269\u5c55Hive\u7684Deserializer \u8981\u7528SQL\u5206\u6790\u6570\u636e\uff0cHive\u5fc5\u987b\u77e5\u9053\u5982\u4f55\u5207\u5206\u6574\u884c\u7684\u65e5\u5fd7\u3002Hive\u63d0\u4f9b\u4e86\u4e00\u4e2a\u63a5\u53e3\uff0c\u7559\u7ed9\u6211\u4eec\u6269\u5c55\u81ea\u5df1\u7684\u5e8f\u5217\u5316\u548c\u53cd\u5e8f\u5217\u5316\u65b9\u6cd5\u3002 import java.util.Properties; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.serde2.Deserializer; import org.apache.hadoop.hive.serde2.SerDeException; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.io.Writable; public class RawActionDeserializer [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[9,4,5],"tags":[11],"class_list":["post-119","post","type-post","status-publish","format-standard","hentry","category-hadoop","category-renren","category-tech","tag-hadoop-hive-streaming-serde-deserializer"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/posts\/119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/comments?post=119"}],"version-history":[{"count":13,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/posts\/119\/revisions"}],"predecessor-version":[{"id":132,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/posts\/119\/revisions\/132"}],"wp:attachment":[{"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/media?parent=119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/categories?post=119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinofool.net\/blog\/wp-json\/wp\/v2\/tags?post=119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}