7.4.2　MapFile类

7.4.2　MapFile类

MapFile的使用与SequenceFile类似，建立MapFile文件的程序如下：

MapFileWriteFile. java

package cn.edn.ruc.cloudcomputing.book.chapter07；

import java.io.IOException；

import java.net.URI；

import org.apache.hadoop.conf.Configuration；

import org.apache.hadoop.fs.*；

import org.apache.hadoop.io.*；

public class MapFileWriteFile{

private static final String[]myValue={

"hello world"，

"bye world"，

"hello hadoop"，

"bye hadoop"

}；

public static void main（String[]args）throws IOException{

String uri="你想要生成SequenceFile的位置"；

Configuration conf=new Configuration（）；

FileSystem fs=FileSystem.get（URI.create（uri），conf）；

IntWritable key=new IntWritable（）；

Text value=new Text（）；

MapFile.Writer writer=null；

try{

writer=new MapFile.Writer（conf, fs, uri, key.get

Class（），value.getClass（））；

for（int i=0；i＜500；i++）{

key.set（i）；

value.set（myValue[i%myValue.length]）；

writer.append（key, value）；

}

}finally{

IOUtils.closeStream（writer）；

}

这个程序与建立SequenceFile文件的程序极其类似，这里就不详述了。与SequenceFile只生成一个文件不同，这个程序生成的是一个文件夹。如下所示：

-rw-r—r—*supergroup 16018/user/root/MapFileOutput/data

-rw-r—r—*supergroup 227/user/root/MapFileOutput/index

其中data是存储的数据，即MapFile文件（经过排序SequenceFile文件），index就是索引了，在这个程序中，其内容如下：

0 128

128 4200

256 8272

384 12344

可以看出，索引是按每128个键建立的，这个值可以通过修改io.map.index.interval的大小来修改。key值后面是偏移量，用于记录key的位置。

读取MapFile文件的程序也很简单，其内容如下所示：

package cn.edn.ruc.cloudcomputing.book.chapter07；

import java.io.IOException；

import java.net.URI；

import org.apache.hadoop.conf.Configuration；

import org.apache.hadoop.fs.FileSystem；

import org.apache.hadoop.io.IOUtils；

import org.apache.hadoop.io.IntWritable；

import org.apache.hadoop.io.MapFile；

import org.apache.hadoop.io.Writable；

import org.apache.hadoop.io.WritableComparable；

import org.apache.hadoop.util.ReflectionUtils；

public class MapFileReadFile{

public static void main（String[]args）throws IOException{

String uri="你想要读取的MapFile文件位置"；

Configuration conf=new Configuration（）；

FileSystem fs=FileSystem.get（URI.create（uri），conf）；

MapFile.Reader reader=null；

try{

reader=new MapFile.Reader（fs, uri, conf）；

WritableComparable key=（WritableComparable）

ReflectionUtils.newInstance（reader.getKeyClass（），conf）；

Writable value=（Writable）ReflectionUtils.

newInstance（reader.getValueClass（），conf）；

while（reader.next（key, value））{

System.out.printf（"%s\t%s\n"，key, value）；

}

reader.get（new IntWritable（7），value）；

System.out.printf（"%s\n"，value）；

}finally{

IOUtils.closeStream（reader）；

}

其特别之处是，MapFile可以查找单个键所对应的value值，见下面这段话：

执行这个操作时，MapFile.Reader（）需要先把index读入内存中，然后执行一个简单的二叉搜索找到数据，MapFile.Reader（）在查找时，会先在索引文件中找到小于我们想要找的key值的索引key值，然后再到data文件中向后查找。

大型MapFile文件的索引通常会占用很大的内存，这时我们可以通过重设索引、增加索引间隔的方法降低索引文件的大小，但是重设索引是一个很麻烦的事情。Hadoop提供了另一个非常有效的方法，就是读取索引文件时，可以每隔几个索引key再读取索引key值，这样就可以有效地降低读入内存的索引文件的大小。至于跳过key的个数是通过io.map.index.skip来设置的。

7.4.2 MapFile类

7.4.2 MapFile类

7.4.2　MapFile类

7.4.2　MapFile类