MapReduce 编程环境配置
准备
-
hadoop
的包https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
解压并存放
-
现代IDE : IntelliJ IDEA 2023.2.3
创建项目
新建项目
在新建项目
中选择新建项目
(不要选择生成器
中的Maven)语言
为Java
,构建系统
为Maven
添加依赖
右上角齿轮图标中找到项目结构
,在项目设置-模块
中找到依赖,添加以下文件或目录
hadoop-3.2.4\share\hadoop\common
hadoop-3.2.4\share\hadoop\common\lib
hadoop-3.2.4\share\hadoop\mapreduce
hadoop-3.2.4\share\hadoop\mapreduce\lib
测试代码
在 IDE 中测试运行
在 \src\main\java
下新建 Java类 WordCount
,填入
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public WordCount() {
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCount.TokenizerMapper.class);
job.setCombinerClass(WordCount.IntSumReducer.class);
job.setReducerClass(WordCount.IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
public TokenizerMapper() {
}
public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
this.word.set(itr.nextToken());
context.write(this.word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public IntSumReducer() {
}
public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
IntWritable val;
for (Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
val = (IntWritable) i$.next();
}
this.result.set(sum);
context.write(key, this.result);
}
}
}
运行项目,会得到结果
Usage: wordcount <in> [<in>...] <out>
出现报错 HADOOP_HOME and hadoop.home.dir are unset
无伤大雅
打包JAR
右上角齿轮图标中找到项目结构
,在项目设置-工件
中点击加号,选择JAR
- 来自具有依赖项的模块
,在主类中选择刚刚编写的WordCount
,点击确认,返回
在 菜单
- 构建
中 选择 构建工件...
选择构建
不出意外即可在输出目录找到生成的文件
服务端测试
把文件上传到服务器
启动集群
start-all.sh
创建1.txt 里面编写若干英语单词,并上传到hdfs
hdfs dfs -put /home/1.txt /input
运行测试
hadoop jar /home/word.jar /input /output
如果能在output
中看到结果就是好的