14.2.2 Pig的下载、安装和配置
当前Pig最新版本为0.10.0,除此之外,Pig还有其他版本,如0.9.2、0.8.1两个版本,用户可以根据需要从Apache官方网站上下载相应的版本。本书使用最新版的Pig 0.10.0,安装包下载地址如下:
http://www.apache.org/dyn/closer.cgi/pig
Pig的安装包下载完成后,需要使用tar-xvf pig-..*.tar.gz命令将其解压。我们可以将Pig放在系统中的任意位置上,并且只需要配置相应的环境变量就可以使用Pig了。不过我们建议将Pig放在Hadoop目录下,方便以后的操作。
解压完成后,需要设置Pig相应的环境变量。环境变量有多种设置方法,用户可以根据自己的需要进行选择。这里我们选择对profile文件进行修改,来设置Pig相应的环境变量。打开“/etc/profile”文件,插入下面的一条语句,保存关闭文件后需要重启系统以使环境变量设置生效:
export PIG_HOME=/<path-to-pigDir>
export PATH=$PIG_HOME/bin:$PIG_HOME/conf:$PATH
当环境变量设置生效后,我们可以通过“pig-help”命令来查看Pig是否安装成功。Pig安装成功后会出现如下所示的提示:
hadoop@master:~/hadoop-1.0.1/pig-0.10.0$pig-help
Apache Pig version 0.10.0(r1328203)
compiled Apr 19 2012,22:54:12
USAGE:Pig[options][-]:Run interactively in grunt shell.
Pig[options]-e[xecute]cmd[cmd……]:Run cmd(s).
Pig[options][-f[ile]]file:Run cmds found in file.
options include:
-4,-log4jconf-Log4j configuration file, overrides log conf
-b,-brief-Brief logging(no timestamps)
-c,-check-Syntax check
-d,-debug-Debug level, INFO is default
-e,-execute-Commands to execute(within quotes)
-f,-file-Path to the script to execute
-g,-embedded-ScriptEngine classname or keyword for the ScriptEngine
-h,-help-Display this message.You can specify topic to get help for that topic.
properties is the only topic currently supported:-h properties.
-i,-version-Display version information
-l,-logfile-Path to client side log file;default is current working directory.
-m,-param_file-Path to the parameter file
-p,-param-Key value pair of the form param=val
-r,-dryrun-Produces script with substituted parameters.Script is not executed.
-t,-optimizer_off-Turn optimizations off.The following values are supported:
SplitFilter-Split filter conditions
PushUpFilter-Filter as early as possible
MergeFilter-Merge filter conditions
PushDownForeachFlatten-Join or explode as late as possible
LimitOptimizer-Limit as early as possible
ColumnMapKeyPrune-Remove unused data
AddForEach-Add ForEach to remove unneeded columns
MergeForEach-Merge adjacent ForEach
GroupByConstParallelSetter-Force parallel 1 for"group all"statement
All-Disable all optimizations
All optimizations listed here are enabled by default.Optimization values
are case insensitive.
-v,-verbose-Print all error messages to screen
-w,-warning-Turn warning logging on;also turns warning aggregation off
-x,-exectype-Set execution mode:local|mapreduce, default is mapreduce.
-F,-stop_on_failure-Aborts execution on the first failed job;default is off
-M,-no_multiquery-Turn multiquery optimization off;default is on
-P,-propertyFile-Path to property file