参考资料
【参考书籍】
[1]Tom White. Hadoop权威指南[M].2版.周敏奇,王晓玲,金澈清,钱卫宁,译.北京:清华大学出版社,2011.
[2]Chuck Lam. Hadoop实战[M].韩冀中.北京:人民邮电出版社,2011.
[3]Eric Sammer. Hadoop Operations.O'Reilly Media,2012.
[4]孙玉琴.Java网络编程精解[M].北京:电子工业出版社,2007.
[5]Ron Hitchens. Java NIO.O'Reilly Media,2002.
[6]George Coulouris, Jean Dollimore, Tim Kindberg.分布式系统概念与设计[M].金蓓弘,等译.北京:机械工业出版社,2004.
[7]Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides.设计模式:可复用面向对象软件的基础[M].李英军,等译.北京:机械工业出版社,2000.
[8]Eric Freeman, Elisabeth Freeman, Kathy Sterra, Bert Bates. O'Reilly公司.Head First设计模式[M].北京:中国电力出版社,2007.
【参考论文】
[1]J. Dean and S.Ghemawat,“Mapreduce:simplified data processing on large clusters,”in Proceedings of the 6th conference on Symposium on Opearting Systems Design&Implementation-Volume 6.Berkeley, CA, USA:USENIX Association,2004,pp.107-113.
[2]Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system.In 19th Symposium on Operating Systems Principles, pages 29-43,Lake George, New York,2003.
[3]Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich.RAFTing MapReduce:Fast recovery on the RAFT.In Serge Abiteboul, Klemens Böhm, Christoph Koch, Kian-Lee Tan, editors, Proceedings of the 27th International Conference on Data Engineering, ICDE 2011,April 11-16,2011,Hannover, Germany.
[4]Matei Zaharia, Andrew Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, Improving MapReduce Performance in Heterogeneous Environments,8th USENIX Symposium on Operating Systems Design Implementation, pp.29-42,San Diego, CA, December,2008.
[5]Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, Song Guo,“SAMR:A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment,”Computer and Information Technology(CIT),2010 IEEE 10th International Conference.
[6]梁李印,“阿里Hadoop集群架构及服务体系”,PPT, Hadoop与大数据技术大会(HBTC 2012).
[7]A. Ghodsi, M.Zaharia, B.Hindman, A.Konwinski, S.Shenker, and I.Stoica.Dominant Resource Fairness:Fair Allocation of Multiple Resource Types.In USENIX NSDI,2011.
[8]Hong Mao, Shengqiu Hu, Zhenzhong Zhang, Limin Xiao, Li Ruan:A Load-Driven Task Scheduler with Adaptive DSC for MapReduce. GreenCom 2011:28-33.
[9]Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal. Hadoop Acceleration through Network Levitated Merging.SC11.Seattle, WA.
[10]Herodotos Herodotou. Hadoop Performance Models, Technical Report, CS-2011-05,Computer Science Department Duke University.
[11]连林江:“百度分布式计算技术发展”,2012.07.08.
[12]M. Zaharia, D.Borthakur, J.S.Sarma, K.Elmeleegy, S.Shenker, and I.Stoica,“Job scheduling for multi-user mapreduce clusters,”EECS Department, University of California, Berkeley, Tech.Rep.,Apr 2009.
[13]M. Zaharia, D.Borthakur, J.S.Sarma, K.Elmeleegy, S.Shenker, and I.Stoica,“Efficient Fair Scheduling for MapReduce”,PPT.
[14]Todd Lipcon, Cloudera,“Optimiziong MapReduce Job Performance”,Hadoop Summit 2012.
[15]M. Zaharia, D.Borthakur, J.Sen Sarma, K.Elmeleegy, S.Shenker, and I.Stoica,“Delay scheduling:A simple technique for achieving locality and fairness in cluster scheduling”in Proc.of EuroSys.ACM,2010,pp.265-278.
[16]Thomas Sandholm and Kevin Lai. Dynamic proportional share scheduling in hadoop.In JSSPP'10:15th Workshop on Job Scheduling Strategies for Parallel Processing,2010.
[17]J. Polo, D.Carrera, Y.Becerra, J.Torres, E.Ayguade and, M.Steinder, and I.Whalley,“Performance-driven task co-scheduling for mapreduce environments,”in Network Operations and Management Symposium(NOMS),2010 IEEE,2010,pp.373-380.
[18]Faraz Ahmad, Seyong Lee, Mithuna Thottethodi and T. N.Vijaykumar,“MapReduce with Communication Overlap(MaRCO)”,ECE Technical Reports,2007.11.01.
[19]Owen O'Malley,“Plugging the Holes:Security and Compatibility”,PPT.
[20]Kerberos认证协议的教学设计,计算机系统与网络安全设计课题组,电子科技大学科学与工程学院.
[21]Owen O'Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell,“Hadoop Security Design”,Yahoo!
[22]Mesos:A Platform for Fine-Grained Resource Sharing in the Data Center. B.Hindman, A.Konwinski, M.Zaharia, A.Ghodsi, A.D.Joseph, R.Katz, S.Shenker and I.Stoica, NSDI 2011,March 2011.
[23]Dominant Resource Fairness:Fair Allocation of Multiple Resources Types. A.Ghodsi, M.Zaharia, B.Hindman, A.Konwinski, S.Shenker, and I.Stoica, NSDI 2011,March 2011.
[24]“yarn(hadoop2)框架的一些软件设计模式”,CSDN.
[25]AMD white paper:“Hadoop Performance Tuning Guide”.
【参考Hadoop Jira[1]】
[1]HDFS-1052:HDFS scalability with multiple namenodes.
[2]HDFS-1623:High Availability Framework for HDFS NN. HDFS-200:In HDFS, sync()not yet guarantees data available to the new readers.
[3]HDFS-265:Revisit append.
[4]HDFS-503:Implement erasure coding as a layer on HDFS.
[5]HDFS-245:Create symbolic links in HDFS.
[6]HADOOP-4487:Security features for Hadoop.
[7]HADOOP-6332:Large-scale Automated Test Framework.
[8]HADOOP-1230:Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes.
[9]MAPREDUCE-334:Change mapred. lib code to use new api.
[10]HADOOP-1722:Make streaming to handle non-utf8 byte array.
[11]HADOOP-7775:RPC Layer improvements to support protocol compatibility.
[12]HADOOP-7347:IPC Wire Compatibility.
[13]HADOOP-4797:RPC Server can leave a lot of direct buffers.
[14]HDFS-2676:Remove Avro RPC.
[15]HDFS-2058:DataTransfer Protocol using protobufs.
[16]MAPREDUCE-1099:Setup and cleanup tasks could affect job latency if they are caught running on bad nodes.
[17]MAPREDUCE-463:The job setup and cleanup tasks should be optional.
[18]MAPREDUCE-744:Support in DistributedCache to share cache files with other users after HADOOP-4493.
[19]HADOOP-153:skip records that fail Task.
[20]HADOOP-2141:speculative execution start up condition based on completion time.
[21]MAPREDUCE-2657:TaskTracker should handle disk failures.
[22]MAPREDUCE-1906:Lower minimum heartbeat interval for tasktracker>Jobtracker.
[23]HADOOP-3245:Provide ability to persist running jobs(extend HADOOP-1876).
[24]MAPREDUCE-873:Simplify Job Recovery.
[25]MAPREDUCE-211:Provide a node health check script and run it periodically to check the node health status.
[26]HADOOP-4305:repeatedly blacklisted tasktrackers should get declared dead.
[27]HADOOP-5643:Ability to blacklist tasktracker.
[28]MAPREDUCE-2657:TaskTracker should handle disk failures.
[29]MAPREDUCE-2415:Distribute TaskTracker userlogs onto multiple disks.
[30]HADOOP-692:Rack-aware Replica Placement.
[31]MAPREDUCE-2415:Distribute TaskTracker userlogs onto multiple disks.
[32]MAPREDUCE-2364:Shouldn't hold lock on rjob while localizing resources.
[33]HADOOP-5883:TaskMemoryMonitorThread might shoot down tasks even if their processes momentarily exceed the requested memory.
[34]MAPREDUCE-1221:Kill tasks on a node if the free physical memory on that machine falls below a configured threshold.
[35]MAPREDUCE-211:Provide a node health check script and run it periodically to check the node health status.
[36]MAPREDUCE-4039:Sort Avoidance.
[37]MAPREDUCE-4049:plugin for generic shuffle service.
[38]HADOOP-331:map outputs should be written to a single output file with an index.
[39]MAPREDUCE-240:Improve the shuffle phase by using the“connection:keep-alive”and doing batch transfers of files.
[40]MAPREDUCE-2841:Task level native optimization.
[41]MAPREDUCE-64:Map-side sort is hampered by io. sort.record.percent.
[42]HADOOP-1965:Handle map output buffers better.
[43]MAPREDUCE-1380:Adaptive Scheduler.
[44]MAPREDUCE-1439:Learning Scheduler.
[45]MAPREDUCE-4360:Capacity Scheduler Hierarchical leaf queue does not honor the max capacity of container queue.
[46]MAPREDUCE-2905:CapBasedLoadManager incorrectly allows assignment when assignMultiple is true(was:assignmultiple per job).
[47]HADOOP-4487:Security features for Hadoop.
[48]MAPREDUCE-2405:MR-279:Implement uber-AppMaster(in-cluster LocalJobRunner for MRv2).
[49]YARN-3:Add support for CPU isolation/monitoring of containers.
[50]YARN-2:Enhance CS to schedule accounting for both memory and cpu cores.
[51]YARN-137:Change the default scheduler to the CapacityScheduler.
[52]MAPREDUCE-211:Provide a node health check script and run it periodically to check the node health status.
[53]MAPREDUCE-1906:Lower default minimum heartbeat interval for tasktracker>Jobtracker.
[54]MAPREDUCE-2355:Add an out of band heartbeat damper.
[55]HADOOP-3136:Assign multiple tasks per TaskTracker heartbeat.
[56]HADOOP-7206:Integrate Snappy compression.
[57]HADOOP-7714:Umbrella for usage of native calls to manage OS cache and readahead.
【参考网络资源】
[1]Apache log4j网址:http://logging.apache.org/log4j/index.html.
[2]Nutch官方网站:http://nutch.apache.org/.
[3]Lucene官方网站:http://lucene.apache.org/.
[4]HDFS RAID介绍:http://wiki.apache.org/hadoop/HDFS-RAID.
[5]An update on Apache Hadoop 1. 0:http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/.
[6]Fault Inject框架介绍:http://hadoop.apache.org/docs/hdfs/r0.21.0/faultinject_framework.html.
[7]Spark官方主页:http://www.spark-project.org/.
[8]Oozie官方主页:http://incubator.apache.org/oozie/.
[9]排序基准:http://sortbenchmark.org/.
[10]HBase官方主页:http://hbase.apache.org/.
[11]Hive官方主页:http://hive.apache.org/.
[12]Pig官方主页:http://pig.apache.org/.
[13]Cascading官方主页:http://www.cascading.org/.
[14]Azkaban官方主页:http://sna-projects.com/azkaban/.
[15]Using Hadoop IPC/RPC for distributed applications:http://www. supermind.org/blog/520.
[16]Architecture of a Highly Scalable NIO-Based Server:http://today. java.net/pub/a/today/2007/02/13/architecture-of-highly-scalable-nio-server.html.
[17]New I/O APIs:http://docs. oracle.com/javase/1.4.2/docs/guide/nio/.
[18]Thrift官方主页:http://thrift.apache.org/.
[19]Protocal Buffer官方主页:http://code.google.com/p/protobuf/.
[20]Avro官方主页:http://avro.apache.org/.
[21]“在Hadoop上调试HadoopStreaming程序的方法详解”,道凡.
[22]Hanborq optimized Hadoop Distribution:https://github. com/hanborq/hadoop.
[23]MapReduce:详解Shuffle过程:http://langyu.iteye.com/blog/992916.
[24]快速排序及优化:http://rdc.taobao.com/team/jm/archives/252.
[25]Hadoop源代码分析:http://caibinbupt.iteye.com/.
[26]nativetask代码及文档:https://github.com/decster/nativetask.
[27]HOD说明文档:http://hadoop.apache.org/docs/stable/hod_scheduler.html.
[28]Torque官方网站:http://www.adaptivecomputing.com/products/open-source/torque/.
[29]Capacity Scheduler说明文档:http://hadoop.apache.org/docs/stable/capacity_scheduler.html.
[30]Fair Scheduler说明文档:http://hadoop.apache.org/docs/stable/fair_scheduler.html.
[31]Max-Min Fairness(Wikipedia):http://en. wikipedia.org/wiki/Max-min fairness.
[32]Kerberos Wiki介绍:http://jianlee.ylinux.org/Computer/Wiki/kerberos.html.
[33]Cloudera CDH3文档:https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide.
[34]YARN与Mesos比较:http://www.quora.com/How-does-YARN-compare-to-Mesos.
[35]Hortonworks官方博客:http://hortonworks.com/blog/.
[36]Cloudera官方博客:http://blog.cloudera.com/blog/.
[37]Facebook Hadoop代码:https://github.com/facebook/hadoop-20.
[38]Mesos官方网站:http://www.mesosproject.org/.
[39]http://www. oberhumer.com/opensource/lzo/.
[40]http://code. google.com/p/snappy/.
[41]https://github. com/toddlipcon/hadoop-lzo.
[1]Hadoop Jira是Hadoop的项目管理系统,通过它可追踪一些问题的解决过程。比如问题“HADOOP-7775”,可通过网址“https://issues.apache.org/jira/browse/HADOOP-7775”查看。