【原创】MapReduce实战（一）

日期：2021-08-06 栏目：程序人生浏览：次

用户每天会在网站上产生各种各样的行为，比如浏览网页，下单等，这种行为会被网站记录下来，形成用户行为日志，并存储在hdfs上。格式如下：

17:03:35.012ᄑpageviewᄑ{"device_id":"4405c39e85274857bbef58e013a08859","user_id":"0921528165741295","ip":"61.53.69.195","session_id":"9d6dc377216249e4a8f33a44eef7576d","req_url":"http://www.bigdataclass.com/product/1527235438747427"}

这是一个类Json 的非结构化数据，主要内容是用户访问网站留下的数据，该文本有device_id,user_id,ip，session_id，req_url等属性，前面还有17:03:20.586ᄑpageviewᄑ，这些非结构化的数据，我们想把该文本通过mr程序处理成被数仓所能读取的格式，比如Json串形式输出，具体形式如下：

{"time_log":1527584600586,"device_id":"4405c39e85274857bbef58e013a08859","user_id":"0921528165741295","active_name":"pageview","ip":"61.53.69.195","session_id":"9d6dc377216249e4a8f33a44eef7576d","req_url":"http://www.bigdataclass.com/my/0921528165741295"}

代码工具：intellij idea, maven，jdk1.8

操作步骤：

配置 pom.xml

1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" 3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 4 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 "> 5 <modelVersion>4.0.0</modelVersion> 6 7 <groupId>netease.bigdata.course</groupId> 8 <artifactId>etl</artifactId> 9 <version>1.0-SNAPSHOT</version> 10 11 <dependencies> 12 <dependency> 13 <groupId>org.apache.hadoop</groupId> 14 <artifactId>hadoop-client</artifactId> 15 <version>2.7.6</version> 16 <scope>provided</scope> 17 </dependency> 18 <dependency> 19 <groupId>com.alibaba</groupId> 20 <artifactId>fastjson</artifactId> 21 <version>1.2.4</version> 22 </dependency> 23 </dependencies> 24 25 <build> 26 <sourceDirectory>src/main</sourceDirectory> 27 <plugins> 28 <plugin> 29 <groupId>org.apache.maven.plugins</groupId> 30 <artifactId>maven-assembly-plugin</artifactId> 31 <configuration> 32 <descriptorRefs> 33 <descriptorRef> 34 jar-with-dependencies 35 </descriptorRef> 36 </descriptorRefs> 37 </configuration> 38 <executions> 39 <execution> 40 <id>make-assembly</id> 41 <phase>package</phase> 42 <goals> 43 <goal>single</goal> 44 </goals> 45 </execution> 46 </executions> 47 </plugin> 48 49 </plugins> 50 </build> 51 52 </project>

转载注明出处：https://www.heiqu.com/zyffgj.html

【原创】MapReduce实战（一）

相关推荐