目的:通过python模拟mr,计算每年的最高气温。
1. 查看数据文件,需要截取年份和气温,生成key-value对。
[tianyc@TeletekHbase python]$ cat test.dat
0067011990999991950051507004...9999999N9+00001+99999999999... 0043011990999991950051512004...9999999N9+00221+99999999999... 0043011990999991950051518004...9999999N9-00111+99999999999... 0043012650999991949032412004...0500001N9+01111+99999999999... 0043012650999991949032418004...0500001N9+00781+99999999999...2. 编写map,打印key-value对
[tianyc@TeletekHbase python]$ cat map.py import reimport sysfor line in sys.stdin: val=line.strip() (year,temp)=(val[15:19],val[40:45]) print "%s\t%s" % (year,temp)[tianyc@TeletekHbase python]$ cat test.dat|python map.py 1950 +00001950 +00221950 -00111949 +01111949 +00783. 将结果排序
[tianyc@TeletekHbase python]$ cat test.dat|python map.py |sort1949 +00781949 +01111950 +00001950 -00111950 +00224. 编写redurce,对map中间结果进行处理,生成最终结果
[tianyc@TeletekHbase python]$ cat red.py import sys(last_key,max_val)=(None,0)for line in sys.stdin: (key,val)=line.strip().split('\t') if last_key and last_key!=key: print '%s\t%s' % (last_key, max_val) (last_key, max_val)=(key,int(val)) else: (last_key, max_val)=(key,max(max_val,int(val)))if last_key: print '%s\t%s' % (last_key, max_val)5. 执行。
[tianyc@TeletekHbase python]$ cat test.dat|python map.py |sort|python red.py 1949 1111950 22后继测试参见