[RFR] templates://hadoop/{hadoop-namenoded.templates}
Christian PERRIER
bubulle at debian.org
Tue Mar 30 16:22:03 UTC 2010
Quoting Justin B Rye (jbr at edlug.org.uk):
> Christian PERRIER wrote:
> > Your review should be sent as an answer to this mail.
>
> Sorry, I'm running late, comments but no actual patch attached.
No problem. based on your comments and proposals, I cooked the
attached review.
-------------- next part --------------
Source: hadoop
Section: java
Priority: optional
Maintainer: Debian Java Maintainers <pkg-java-maintainers at lists.alioth.debian.org>
Uploaders: Thomas Koch <thomas.koch at ymc.ch>
Homepage: http://hadoop.apache.org
Vcs-Browser: http://git.debian.org/?p=pkg-java/hadoop.git
Vcs-Git: git://git.debian.org/pkg-java/hadoop.git
Standards-Version: 3.8.4
Build-Depends: debhelper (>= 7.4.11), default-jdk, ant (>= 1.6.0), javahelper (>= 0.28),
po-debconf,
libcommons-cli-java,
libcommons-codec-java,
libcommons-el-java,
libcommons-httpclient-java,
libcommons-io-java,
libcommons-logging-java,
libcommons-net-java,
libtomcat6-java,
libjetty-java (>>6),
libservlet2.5-java,
liblog4j1.2-java,
libslf4j-java,
libxmlenc-java,
liblucene2-java,
libhsqldb-java,
ant-optional,
javacc
Package: libhadoop-java
Architecture: all
Depends: ${misc:Depends},
libcommons-cli-java,
libcommons-codec-java,
libcommons-el-java,
libcommons-httpclient-java,
libcommons-io-java,
libcommons-logging-java,
libcommons-net-java,
libtomcat6-java,
libjetty-java (>>6),
libservlet2.5-java,
liblog4j1.2-java,
libslf4j-java,
libxmlenc-java
Suggests: libhsqldb-java
Description: data-intensive clustering framework - Java libraries
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
MapReduce divides applications into many small blocks of work. HDFS creates
multiple replicas of data blocks for reliability, placing them on compute
nodes around the cluster. MapReduce can then process the data where it is
located.
.
This package contains the core Java libraries.
Package: libhadoop-index-java
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
liblucene2-java
Description: data-intensive clustering framework - Lucene index support
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
MapReduce divides applications into many small blocks of work. HDFS creates
multiple replicas of data blocks for reliability, placing them on compute
nodes around the cluster. MapReduce can then process the data where it is
located.
.
This contrib package provides a utility to build or update an index
using Map/Reduce.
.
A distributed "index" is partitioned into "shards". Each shard corresponds
to a Lucene instance. org.apache.hadoop.contrib.index.main.UpdateIndex
contains the main() method which uses a Map/Reduce job to analyze documents
and update Lucene instances in parallel.
Package: hadoop-bin
Section: misc
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
default-jre-headless | java6-runtime-headless
Description: data-intensive clustering framework - tools
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
MapReduce divides applications into many small blocks of work. HDFS creates
multiple replicas of data blocks for reliability, placing them on compute
nodes around the cluster. MapReduce can then process the data where it is
located.
.
This package contains the Hadoop shell interface. See the hadoop-.*d
packages for the Hadoop daemons.
Package: hadoop-daemons-common
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-bin (= ${binary:Version}), daemon, adduser,
lsb-base (>= 3.2-14)
Description: data-intensive clustering framework - common files
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
This package prepares some common things for all Hadoop daemon packages:
* creates the user hadoop
* creates data and log directories owned by the hadoop user
* manages the update-alternatives mechanism for Hadoop configuration
* brings in the common dependencies
Package: libhadoop-java-doc
Section: doc
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version})
Description: data-intensive clustering framework - Java documentation
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
This package provides the API documentation of Hadoop.
Package: hadoop-tasktrackerd
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: data-intensive clustering framework - Task Tracker
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
The Task Tracker is the Hadoop service that accepts MapReduce tasks and
computes results. Each node in a Hadoop cluster that should be doing
computation should run a Task Tracker.
Package: hadoop-jobtrackerd
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: data-intensive clustering framework - Job Tracker
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
The Job Tracker is a central service which is responsible for managing
the Task Tracker services running on all nodes in an Hadoop Cluster.
The Job Tracker allocates work to the Task Tracker nearest to the data
with an available work slot.
Package: hadoop-namenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: data-intensive clustering framework - name node
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
Name Node, which manages the block locations of files on the file system.
Package: hadoop-secondarynamenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: data-intensive clustering framework - secondary name node
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
The secondary Name Node is responsible for checkpointing file system images.
It is _not_ a failover partner for the Name Node, and may safely be run on the
same machine.
Package: hadoop-datanoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
Description: data-intensive clustering framework - data node
Hadoop is a software platform for writing and running applications
that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
* Economical: It distributes the data and processing across clusters
of commonly available computers. These clusters can number
into the thousands of nodes.
* Efficient: By distributing the data, Hadoop can process it in parallel
on the nodes where the data is located. This makes it
extremely rapid.
* Reliable: Hadoop automatically maintains multiple copies of data and
automatically redeploys computing tasks based on failures.
.
The Data Nodes in the Hadoop Cluster are responsible for serving up
blocks of data over the network to Hadoop Distributed Filesystem
(HDFS) clients.
-------------- next part --------------
Template: hadoop-namenoded/format
Type: boolean
Default: false
_Description: Should the namenode's file system be formatted?
The namenode manages the Hadoop Distributed File System (HDFS). Like
a normal file system, it needs to be formatted before use;
otherwise the namenode daemon will not start.
.
This operation does not affect other file systems on this
computer. You can safely choose to format the file system if you're
using HDFS for the first time and don't have data from previous
installations on this computer.
.
If you choose not to format the file system right now, you can do it
later by executing "hadoop namenode -format" as the user "hadoop".
-------------- next part --------------
--- hadoop.old/debian/hadoop-namenoded.templates 2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/hadoop-namenoded.templates 2010-03-30 18:09:54.250674621 +0200
@@ -1,17 +1,15 @@
Template: hadoop-namenoded/format
Type: boolean
Default: false
-_Description: Should the namenode's filesystem be formatted now?
- The namenode manages the Hadoop Distributed FileSystem (HDFS). Like a
- normal filesystem, it needs to be formatted prior to first use. If the
- HDFS filesystem is not formatted, the namenode daemon will fail to
- start.
+_Description: Should the namenode's file system be formatted?
+ The namenode manages the Hadoop Distributed File System (HDFS). Like
+ a normal file system, it needs to be formatted before use;
+ otherwise the namenode daemon will not start.
.
- This operation does not affect the "normal" filesystem on this
- computer. If you're using HDFS for the first time and don't have data
- from previous installations on this computer, it should be save to
- proceed with yes.
+ This operation does not affect other file systems on this
+ computer. You can safely choose to format the file system if you're
+ using HDFS for the first time and don't have data from previous
+ installations on this computer.
.
- You can later on format the filesystem yourself with
- .
- su -c"hadoop namenode -format" hadoop
+ If you choose not to format the file system right now, you can do it
+ later by executing "hadoop namenode -format" as the user "hadoop".
--- hadoop.old/debian/control 2010-03-22 09:56:11.717948376 +0100
+++ hadoop/debian/control 2010-03-30 18:20:36.183901228 +0200
@@ -44,14 +44,54 @@
libslf4j-java,
libxmlenc-java
Suggests: libhsqldb-java
-Description: software platform for processing vast amounts of data
- This package contains the core java libraries.
+Description: data-intensive clustering framework - Java libraries
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
+ This package contains the core Java libraries.
Package: libhadoop-index-java
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
liblucene2-java
-Description: Hadoop contrib to create lucene indexes
+Description: data-intensive clustering framework - Lucene index support
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS).
+ MapReduce divides applications into many small blocks of work. HDFS creates
+ multiple replicas of data blocks for reliability, placing them on compute
+ nodes around the cluster. MapReduce can then process the data where it is
+ located.
+ .
This contrib package provides a utility to build or update an index
using Map/Reduce.
.
@@ -65,9 +105,9 @@
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version}),
default-jre-headless | java6-runtime-headless
-Description: software platform for processing vast amounts of data
- Hadoop is a software platform that lets one easily write and
- run applications that process vast amounts of data.
+Description: data-intensive clustering framework - tools
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
.
Here's what makes Hadoop especially useful:
* Scalable: Hadoop can reliably store and process petabytes.
@@ -86,33 +126,75 @@
nodes around the cluster. MapReduce can then process the data where it is
located.
.
- This package contains the hadoop shell interface. See the packages hadoop-.*d
- for the hadoop daemons.
+ This package contains the Hadoop shell interface. See the hadoop-.*d
+ packages for the Hadoop daemons.
Package: hadoop-daemons-common
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-bin (= ${binary:Version}), daemon, adduser,
lsb-base (>= 3.2-14)
-Description: Creates user and directories for hadoop daemons
- Prepares some common things for all hadoop daemon packages:
+Description: data-intensive clustering framework - common files
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ This package prepares some common things for all Hadoop daemon packages:
* creates the user hadoop
* creates data and log directories owned by the hadoop user
- * manages the update-alternatives mechanism for hadoop configuration
+ * manages the update-alternatives mechanism for Hadoop configuration
* brings in the common dependencies
Package: libhadoop-java-doc
Section: doc
Architecture: all
Depends: ${misc:Depends}, libhadoop-java (= ${binary:Version})
-Description: Contains the javadoc for hadoop
- contains the api documentation of hadoop
+Description: data-intensive clustering framework - Java documentation
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ This package provides the API documentation of Hadoop.
Package: hadoop-tasktrackerd
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Task Tracker for Hadoop
+Description: data-intensive clustering framework - Task Tracker
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
The Task Tracker is the Hadoop service that accepts MapReduce tasks and
computes results. Each node in a Hadoop cluster that should be doing
computation should run a Task Tracker.
@@ -121,34 +203,90 @@
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Job Tracker for Hadoop
- The jobtracker is a central service which is responsible for managing
- the tasktracker services running on all nodes in a Hadoop Cluster.
- The jobtracker allocates work to the tasktracker nearest to the data
+Description: data-intensive clustering framework - Job Tracker
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ The Job Tracker is a central service which is responsible for managing
+ the Task Tracker services running on all nodes in an Hadoop Cluster.
+ The Job Tracker allocates work to the Task Tracker nearest to the data
with an available work slot.
Package: hadoop-namenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Name Node for Hadoop
+Description: data-intensive clustering framework - name node
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
- namenode, which manages the block locations of files on the filesystem.
+ Name Node, which manages the block locations of files on the file system.
Package: hadoop-secondarynamenoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Secondary Name Node for Hadoop
- The Secondary Name Node is responsible for checkpointing file system images.
- It is _not_ a failover pair for the namenode, and may safely be run on the
+Description: data-intensive clustering framework - secondary name node
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
+ The secondary Name Node is responsible for checkpointing file system images.
+ It is _not_ a failover partner for the Name Node, and may safely be run on the
same machine.
Package: hadoop-datanoded
Section: misc
Architecture: all
Depends: ${misc:Depends}, hadoop-daemons-common (= ${binary:Version})
-Description: Data Node for Hadoop
+Description: data-intensive clustering framework - data node
+ Hadoop is a software platform for writing and running applications
+ that process vast amounts of data on a distributed file system.
+ .
+ Here's what makes Hadoop especially useful:
+ * Scalable: Hadoop can reliably store and process petabytes.
+ * Economical: It distributes the data and processing across clusters
+ of commonly available computers. These clusters can number
+ into the thousands of nodes.
+ * Efficient: By distributing the data, Hadoop can process it in parallel
+ on the nodes where the data is located. This makes it
+ extremely rapid.
+ * Reliable: Hadoop automatically maintains multiple copies of data and
+ automatically redeploys computing tasks based on failures.
+ .
The Data Nodes in the Hadoop Cluster are responsible for serving up
blocks of data over the network to Hadoop Distributed Filesystem
(HDFS) clients.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-java-maintainers/attachments/20100330/e0ea40d3/attachment-0001.pgp>
More information about the pkg-java-maintainers
mailing list