参考资料

参考资料

【参考书籍】

[1]Tom White. Hadoop权威指南[M].2版.周敏奇，王晓玲，金澈清，钱卫宁，译.北京：清华大学出版社，2011.

[2]Chuck Lam. Hadoop实战[M].韩冀中.北京：人民邮电出版社，2011.

[3]Eric Sammer. Hadoop Operations.O'Reilly Media，2012.

[4]孙玉琴.Java网络编程精解[M].北京：电子工业出版社，2007.

[5]Ron Hitchens. Java NIO.O'Reilly Media，2002.

[6]George Coulouris, Jean Dollimore, Tim Kindberg.分布式系统概念与设计[M].金蓓弘，等译.北京：机械工业出版社，2004.

[7]Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides.设计模式：可复用面向对象软件的基础[M].李英军，等译.北京：机械工业出版社，2000.

[8]Eric Freeman, Elisabeth Freeman, Kathy Sterra, Bert Bates. O'Reilly公司.Head First设计模式[M].北京：中国电力出版社，2007.

【参考论文】

[1]J. Dean and S.Ghemawat，“Mapreduce：simplified data processing on large clusters，”in Proceedings of the 6th conference on Symposium on Opearting Systems Design＆Implementation-Volume 6.Berkeley, CA, USA：USENIX Association，2004，pp.107-113.

[2]Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system.In 19th Symposium on Operating Systems Principles, pages 29-43，Lake George, New York，2003.

[3]Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, Jens Dittrich.RAFTing MapReduce：Fast recovery on the RAFT.In Serge Abiteboul, Klemens B＆ouml；hm, Christoph Koch, Kian-Lee Tan, editors, Proceedings of the 27th International Conference on Data Engineering, ICDE 2011，April 11-16，2011，Hannover, Germany.

[4]Matei Zaharia, Andrew Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, Improving MapReduce Performance in Heterogeneous Environments，8th USENIX Symposium on Operating Systems Design Implementation, pp.29-42，San Diego, CA, December，2008.

[5]Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng, Song Guo，“SAMR：A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment，”Computer and Information Technology（CIT），2010 IEEE 10th International Conference.

[6]梁李印，“阿里Hadoop集群架构及服务体系”，PPT, Hadoop与大数据技术大会（HBTC 2012）.

[7]A. Ghodsi, M.Zaharia, B.Hindman, A.Konwinski, S.Shenker, and I.Stoica.Dominant Resource Fairness：Fair Allocation of Multiple Resource Types.In USENIX NSDI，2011.

[8]Hong Mao, Shengqiu Hu, Zhenzhong Zhang, Limin Xiao, Li Ruan：A Load-Driven Task Scheduler with Adaptive DSC for MapReduce. GreenCom 2011：28-33.

[9]Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal. Hadoop Acceleration through Network Levitated Merging.SC11.Seattle, WA.

[10]Herodotos Herodotou. Hadoop Performance Models, Technical Report, CS-2011-05，Computer Science Department Duke University.

[11]连林江：“百度分布式计算技术发展”，2012.07.08.

[12]M. Zaharia, D.Borthakur, J.S.Sarma, K.Elmeleegy, S.Shenker, and I.Stoica，“Job scheduling for multi-user mapreduce clusters，”EECS Department, University of California, Berkeley, Tech.Rep.，Apr 2009.

[13]M. Zaharia, D.Borthakur, J.S.Sarma, K.Elmeleegy, S.Shenker, and I.Stoica，“Efficient Fair Scheduling for MapReduce”，PPT.

[14]Todd Lipcon, Cloudera，“Optimiziong MapReduce Job Performance”，Hadoop Summit 2012.

[15]M. Zaharia, D.Borthakur, J.Sen Sarma, K.Elmeleegy, S.Shenker, and I.Stoica，“Delay scheduling：A simple technique for achieving locality and fairness in cluster scheduling”in Proc.of EuroSys.ACM，2010，pp.265-278.

[16]Thomas Sandholm and Kevin Lai. Dynamic proportional share scheduling in hadoop.In JSSPP'10：15th Workshop on Job Scheduling Strategies for Parallel Processing，2010.

[17]J. Polo, D.Carrera, Y.Becerra, J.Torres, E.Ayguade and, M.Steinder, and I.Whalley，“Performance-driven task co-scheduling for mapreduce environments，”in Network Operations and Management Symposium（NOMS），2010 IEEE，2010，pp.373-380.

[18]Faraz Ahmad, Seyong Lee, Mithuna Thottethodi and T. N.Vijaykumar，“MapReduce with Communication Overlap（MaRCO）”，ECE Technical Reports，2007.11.01.

[19]Owen O'Malley，“Plugging the Holes：Security and Compatibility”，PPT.

[20]Kerberos认证协议的教学设计，计算机系统与网络安全设计课题组，电子科技大学科学与工程学院.

[21]Owen O'Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell，“Hadoop Security Design”，Yahoo！

[22]Mesos：A Platform for Fine-Grained Resource Sharing in the Data Center. B.Hindman, A.Konwinski, M.Zaharia, A.Ghodsi, A.D.Joseph, R.Katz, S.Shenker and I.Stoica, NSDI 2011，March 2011.

[23]Dominant Resource Fairness：Fair Allocation of Multiple Resources Types. A.Ghodsi, M.Zaharia, B.Hindman, A.Konwinski, S.Shenker, and I.Stoica, NSDI 2011，March 2011.

[24]“yarn（hadoop2）框架的一些软件设计模式”，CSDN.

[25]AMD white paper：“Hadoop Performance Tuning Guide”.

【参考Hadoop Jira^[1]】

[1]HDFS-1052：HDFS scalability with multiple namenodes.

[2]HDFS-1623：High Availability Framework for HDFS NN. HDFS-200：In HDFS, sync()not yet guarantees data available to the new readers.

[3]HDFS-265：Revisit append.

[4]HDFS-503：Implement erasure coding as a layer on HDFS.

[5]HDFS-245：Create symbolic links in HDFS.

[6]HADOOP-4487：Security features for Hadoop.

[7]HADOOP-6332：Large-scale Automated Test Framework.

[8]HADOOP-1230：Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes.

[9]MAPREDUCE-334：Change mapred. lib code to use new api.

[10]HADOOP-1722：Make streaming to handle non-utf8 byte array.

[11]HADOOP-7775：RPC Layer improvements to support protocol compatibility.

[12]HADOOP-7347：IPC Wire Compatibility.

[13]HADOOP-4797：RPC Server can leave a lot of direct buffers.

[14]HDFS-2676：Remove Avro RPC.

[15]HDFS-2058：DataTransfer Protocol using protobufs.

[16]MAPREDUCE-1099：Setup and cleanup tasks could affect job latency if they are caught running on bad nodes.

[17]MAPREDUCE-463：The job setup and cleanup tasks should be optional.

[18]MAPREDUCE-744：Support in DistributedCache to share cache files with other users after HADOOP-4493.

[19]HADOOP-153：skip records that fail Task.

[20]HADOOP-2141：speculative execution start up condition based on completion time.

[21]MAPREDUCE-2657：TaskTracker should handle disk failures.

[22]MAPREDUCE-1906：Lower minimum heartbeat interval for tasktracker＞Jobtracker.

[23]HADOOP-3245：Provide ability to persist running jobs（extend HADOOP-1876）.

[24]MAPREDUCE-873：Simplify Job Recovery.

[25]MAPREDUCE-211：Provide a node health check script and run it periodically to check the node health status.

[26]HADOOP-4305：repeatedly blacklisted tasktrackers should get declared dead.

[27]HADOOP-5643：Ability to blacklist tasktracker.

[28]MAPREDUCE-2657：TaskTracker should handle disk failures.

[29]MAPREDUCE-2415：Distribute TaskTracker userlogs onto multiple disks.

[30]HADOOP-692：Rack-aware Replica Placement.

[31]MAPREDUCE-2415：Distribute TaskTracker userlogs onto multiple disks.

[32]MAPREDUCE-2364：Shouldn't hold lock on rjob while localizing resources.

[33]HADOOP-5883：TaskMemoryMonitorThread might shoot down tasks even if their processes momentarily exceed the requested memory.

[34]MAPREDUCE-1221：Kill tasks on a node if the free physical memory on that machine falls below a configured threshold.

[35]MAPREDUCE-211：Provide a node health check script and run it periodically to check the node health status.

[36]MAPREDUCE-4039：Sort Avoidance.

[37]MAPREDUCE-4049：plugin for generic shuffle service.

[38]HADOOP-331：map outputs should be written to a single output file with an index.

[39]MAPREDUCE-240：Improve the shuffle phase by using the“connection：keep-alive”and doing batch transfers of files.

[40]MAPREDUCE-2841：Task level native optimization.

[41]MAPREDUCE-64：Map-side sort is hampered by io. sort.record.percent.

[42]HADOOP-1965：Handle map output buffers better.

[43]MAPREDUCE-1380：Adaptive Scheduler.

[44]MAPREDUCE-1439：Learning Scheduler.

[45]MAPREDUCE-4360：Capacity Scheduler Hierarchical leaf queue does not honor the max capacity of container queue.

[46]MAPREDUCE-2905：CapBasedLoadManager incorrectly allows assignment when assignMultiple is true（was：assignmultiple per job）.

[47]HADOOP-4487：Security features for Hadoop.

[48]MAPREDUCE-2405：MR-279：Implement uber-AppMaster（in-cluster LocalJobRunner for MRv2）.

[49]YARN-3：Add support for CPU isolation/monitoring of containers.

[50]YARN-2：Enhance CS to schedule accounting for both memory and cpu cores.

[51]YARN-137：Change the default scheduler to the CapacityScheduler.

[52]MAPREDUCE-211：Provide a node health check script and run it periodically to check the node health status.

[53]MAPREDUCE-1906：Lower default minimum heartbeat interval for tasktracker＞Jobtracker.

[54]MAPREDUCE-2355：Add an out of band heartbeat damper.

[55]HADOOP-3136：Assign multiple tasks per TaskTracker heartbeat.

[56]HADOOP-7206：Integrate Snappy compression.

[57]HADOOP-7714：Umbrella for usage of native calls to manage OS cache and readahead.

【参考网络资源】

[1]Apache log4j网址：http://logging.apache.org/log4j/index.html.

[2]Nutch官方网站：http://nutch.apache.org/.

[3]Lucene官方网站：http://lucene.apache.org/.

[4]HDFS RAID介绍：http://wiki.apache.org/hadoop/HDFS-RAID.

[5]An update on Apache Hadoop 1. 0：http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/.

[6]Fault Inject框架介绍：http://hadoop.apache.org/docs/hdfs/r0.21.0/faultinject_framework.html.

[7]Spark官方主页：http://www.spark-project.org/.

[8]Oozie官方主页：http://incubator.apache.org/oozie/.

[9]排序基准：http://sortbenchmark.org/.

[10]HBase官方主页：http://hbase.apache.org/.

[11]Hive官方主页：http://hive.apache.org/.

[12]Pig官方主页：http://pig.apache.org/.

[13]Cascading官方主页：http://www.cascading.org/.

[14]Azkaban官方主页：http://sna-projects.com/azkaban/.

[15]Using Hadoop IPC/RPC for distributed applications：http://www. supermind.org/blog/520.

[16]Architecture of a Highly Scalable NIO-Based Server：http://today. java.net/pub/a/today/2007/02/13/architecture-of-highly-scalable-nio-server.html.

[17]New I/O APIs：http://docs. oracle.com/javase/1.4.2/docs/guide/nio/.

[18]Thrift官方主页：http://thrift.apache.org/.

[19]Protocal Buffer官方主页：http://code.google.com/p/protobuf/.

[20]Avro官方主页：http://avro.apache.org/.

[21]“在Hadoop上调试HadoopStreaming程序的方法详解”，道凡.

[22]Hanborq optimized Hadoop Distribution：https://github. com/hanborq/hadoop.

[23]MapReduce：详解Shuffle过程：http://langyu.iteye.com/blog/992916.

[24]快速排序及优化：http://rdc.taobao.com/team/jm/archives/252.

[25]Hadoop源代码分析：http://caibinbupt.iteye.com/.

[26]nativetask代码及文档：https://github.com/decster/nativetask.

[27]HOD说明文档：http://hadoop.apache.org/docs/stable/hod_scheduler.html.

[28]Torque官方网站：http://www.adaptivecomputing.com/products/open-source/torque/.

[29]Capacity Scheduler说明文档：http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

[30]Fair Scheduler说明文档：http://hadoop.apache.org/docs/stable/fair_scheduler.html.

[31]Max-Min Fairness（Wikipedia）：http://en. wikipedia.org/wiki/Max-min fairness.

[32]Kerberos Wiki介绍：http://jianlee.ylinux.org/Computer/Wiki/kerberos.html.

[33]Cloudera CDH3文档：https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide.

[34]YARN与Mesos比较：http://www.quora.com/How-does-YARN-compare-to-Mesos.

[35]Hortonworks官方博客：http://hortonworks.com/blog/.

[36]Cloudera官方博客：http://blog.cloudera.com/blog/.

[37]Facebook Hadoop代码：https://github.com/facebook/hadoop-20.

[38]Mesos官方网站：http://www.mesosproject.org/.

[39]http://www. oberhumer.com/opensource/lzo/.

[40]http://code. google.com/p/snappy/.

[41]https://github. com/toddlipcon/hadoop-lzo.

[1]Hadoop Jira是Hadoop的项目管理系统，通过它可追踪一些问题的解决过程。比如问题“HADOOP-7775”，可通过网址“https://issues.apache.org/jira/browse/HADOOP-7775”查看。