2013年8月23日星期五

Connect with eclipse HDFS but reported Error: failure to login Bug

I try to connect with eclipse HDFS but reported Error: failure to login error
Here is the online method :
---------------------------------------------- -------------------------------------------------- ------------
will HADOOP_HOME / lib directory commons-configuration-1.6.jar, commonshttpclient-
3.0.1.jar, commons-lang-2.4.jar, jackson-core-asl-1.0.1.jar and jackson-mapper-asl-1.0.1.jar ; etc. 5 pack
Copy to hadoop-eclipse-plugin-0.20.203.0.jar lib directory , and then modify the package META-INF directory of MANIFEST.MF, will modify the classpath about the content:
Bundle-ClassPath: classes /, lib / hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-httpclient-
3.0.1.jar, lib/jackson-core-asl-1.0.1.jar, lib/jackson-mapper-asl-1.0.1.jar, lib/commons-configuration-
1.6.jar, lib/commons-lang-2.4.jar
---------------------------------------------- -------------------------------------------------- ------------
I tried this method or not.

------ Solution ------------------------------------ --------
try first original one plugin jar files from Eclipse / plugins delete, start eclipse, close eclipse, then modified plugin copied to the Eclipse / plugins /, start eclipse, should be on it . Estimated plugin is eclipse cach live
------ For reference only ----------------------------- ----------
continue to try to chant .

hadoop stream can not be used to call a shell command

hadoop I use the command hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar-input input-output output-mapper / bin / cat-file test.sh-reducer ; test.sh
shell script test.sh:
#! / bin / bash
hadoop fs-get input/1.avi ~ / hadoop-1.0.0/tmp/1.avi
error message is:
12/06/01 16:31:25 ERROR streaming.StreamJob: Job not successful. Error: # of failed Reduce Tasks exceeded allowed ; limit. FailedCount: 1. LastFailedTask: task_201206011340_0024_r_000000
12/06/01 16:31:25 INFO streaming.StreamJob: killJob ...
Streaming Job Failed!

------ Solution ------------------------------------ --------
reduce tasks have to hang up, you can see on the web console jobtracker specific fault stack http://jobmaster:port, port is probably 50,070 or 50,030 Port
------ For reference only ------------------------------------- -
finally someone responds.
because in reduce.sh call the hadoop fs-get commands like grep or cat if the change to do what no problem,
reduce the return value back to the task is not 0, hadoop default will try four times, four times, after they reported the entire mission fails,
Is there a shell script called hadoop command experience, how to solve it? Seek advice
------ For reference only ------------------------------------ ---
looked wrong as you say, or do not know where is the problem, I hadoop novice, help see what is the problem?
java.lang.RuntimeException: PipeMapRed.waitOutputThreads (): subprocess failed with code 255
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads (PipeMapRed.java: 311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished (PipeMapRed.java: 545)
at org.apache.hadoop.streaming.PipeMapper.close (PipeMapper.java: 132)
at org.apache.hadoop.mapred.MapRunner.run (MapRunner.java: 57)
at org.apache.hadoop.streaming.PipeMapRunner.run (PipeMapRunner.java: 36)
at org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java: 436)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java: 372)
at org.apache.hadoop.mapred.Child $ 4.run (Child.java: 255)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java: 1083)
at org.apache.hadoop.mapred.Child.main (Child.java: 249)
------ For reference only -------------- -------------------------
after suffering solve some problems, does not recognize the hadoop command, you need to shell script export hadoop environment Variables
------ For reference only ------------------------------------- -
Well then good ah

2013年8月22日星期四

whether you can control tasktracker hadoop number ?

rt. Is already deployed and will be up and running hadoop , I can set in code I need now tasktracker number ? For example, I have 10 machines on a cluster has been at work , I just want to open a mission among five machine work , how to achieve that ? Do not tell me to change the configuration file restart after shutdown . . .
Similarly , datanode whether the number can be controlled ?
------ Solution ---------------------------------------- ----
expect answers
------ Solution ------------------------------- -------------
CSDN too shallow ?
------ Solution ------------------- -------------------------
can be set , the system uses a default value that can be achieved through a function call to a class setting .
------ Solution ---------------------------------------- ----
only know that this can be set up. Concrete will not ah
------ For reference only --------------------------------- ------
how Meirenhuida ah. .

under the windows. / hadoop namenode-format always fails,pseudo-distributed

likos @ likos-PC / cygdrive / d / hadoop / run / bin
$. / hadoop namenode-format
12/03/09 16:26:26 INFO namenode.NameNode: STARTUP_MSG:
/ ********************************************* ***************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = likos-PC/192.168.0.119
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/br
********************************************** ************** /
Re-format filesystem in \ tmp \ hadoop-likos \ dfs \ name? (Y or N) n
Format aborted in \ tmp \ hadoop-likos \ dfs \ name
12/03/09 16:27:02 INFO namenode.NameNode: SHUTDOWN_MSG:
/ ********************************************* ***************
SHUTDOWN_MSG: Shutting down NameNode at likos-PC/192.168.0.119
********************************************** ************** /

The above is copied, what the situation, trying to solve
in this case. / start-all.sh only start jobtracker, other namenode datanode secondnamenode tasktracker can not start by jps see
------ Solution - -------------------------------------------
If you simply learn learn better.
If the test is to be done or to a cluster or on linux it.

you can first set in the configuration file tmp path, and then built out of the folder and re-format.
------ For reference only -------------------------------------- -
is clear that the path is wrong ah.
------ For reference only -------------------------------------- -
Thank you, although the problem has not resolved

2013年8月21日星期三

hbase configuration problems

Recently started learning hadoop
I was in redhat configure hadoop, hbase, zookeeper cluster, hadoop cluster can start, zookeeper cluster startup is not a problem
But hbase starts, when entering shell list at org.apache.hadoop.hbase.MasterNotRunningException: null error
checked the Internet, I hadoop config files core-site.xml and hbase in the appropriate place is the same, start hbase hbase master and slave are prompted to start, not an error, but the inspection found Hmaster JPS does not start to view the log hbase found at every start
org.apache.hadoop.hbase.Hmaster: unhandled exception
java.lang.illegalargumentexception: java.net.URISyntaxException: relative path in absolute uri Such errors cause failed to start because the host view untreated, may reverse domain name resolution is not found when the host, But I ping any machine on the host's ip and domain able to ping all my configuration files are also using a domain name without using ip, and / etc / hosts I have done under the corresponding reverse Resolution, hbase the lib package I also replaced in the core package hadoop, hadoop and hbase compatibility of these two files is not a problem, too much to do before the node hadoop, hbase of a single node can still use The. Problem in the end where it? Hope master wing
------ Solution ------------------------------------- -------

problems have been solved, is the use of virtual machine issue, previously vmware, now turned into a visual box, this question is no, hbase everything is normal. . .
------ For reference only -------------------------------------- -
the same problem, it is a depression!
------ For reference only -------------------------------------- -
your problem solved what I recently busy with work could not continue to study hadoop cluster, I intend to reconfigure the single hbase, hadoop and zookeeper or the use of clusters, try
- ---- For reference only ---------------------------------------
Thank you I also use vmware seems to have installed a visual box to try to get me a vmware half ah
------ For reference only ------- --------------------------------
java.net.URISyntaxException: Relative path in absolute ; URI:

hadoop in pig delimiter issue

File test.txt column delimiter is CRTL + A, that is, \ 001 ,

run raw = LOAD '... / test.txt' USING PigStorage ('\ 001') AS (a, b, c);

can PigStorage does not seem to recognize \ 001 error.

question :
1: without replacing the file delimiters premise, how to solve this problem ?
2: If delimiter ( not necessarily \ 001 ) with regex , then the load statement how to write ?
------ Solution ---------------------------------------- ----
1, with sed statement to ' \ 001 ' replace it, replace recognizable symbols.

2, raw = LOAD '... / test.txt' USING PigStorage (' Your regex ') AS (a, b, c);

------ For reference only ---------------------------------- -----
look younger issues now, thank you ,
not know the answer , but also holding individual field ah ! !
------ For reference only -------------------------------------- -

This reply was moderator deleted at 2012-02-02 10:39:00

------ For reference only ---------------------------------- -----
this question nobody know ? ? ? . . . . . . .
------ For reference only -------------------------------------- -
knot post, Meirenhuida also end it

2013年8月20日星期二

About the distributed cache (memcached) data synchronization issue

Two computers : One is a memcached server , the other one is to write the logical server.

logical server only logical operations , memcached caching user data when logical server requires data obtained from memcached above and temporarily cached locally , modifying user data , and then save it to the memcached server , and empty the local the data.

so there will be a question of logical servers memcached multiple threads simultaneously modify a user on the data, there will be concurrent problems that may lead to memcached above data is overwritten.

My idea is to give the user the logical server side data plus a lock, a user to modify data, first get the user's lock , and then take the memcached access to the data above , and then modify the data, and then Save to memcached, and then release the lock .

feel this way, the efficiency may be lower , I do not know there is no better way to achieve this , please master guiding !
------ Solution ---------------------------------------- ----
the user data according to user name hash% the number of threads , each thread processing the corresponding user data
------ For reference only ----------- ----------------------------
added: logical server is using java .
------ For reference only -------------------------------------- -
I wish to address is the data synchronization problems ah
------ For reference only ------------------------ ---------------
there are different types of threads may modify the same data

Why do not the database mapreduce hadoop the group by statement toachieve it ?

SELECT Key,SUM(Value) FROM Split
 GROUP BY Key

For performance problems or other reasons?
------ Solution ---------------------------------------- ----
probably because mapreduce just a programming model , he just does not solve let programmers of distributed parallel programming in the case of running your own problems on distributed systems , where he the main achievement of the distributed parallel group by.

2013年8月19日星期一

Example

hadoop Example wordcount not run , the error is shown below : [img = http://hi.csdn.net/space-6120575-do-album-picid-974138-goto-down.html] [/ img]
Java.Net.ConnectException: Call to localhost/127.0.0.1: 9001 failed to connection exception:
seeking guidance
------ Solution ---------------------------------- ----------
Hadoop is not configured right
------ For reference only -------------------- -------------------

------ For reference only ----------------------------------- ----

This reply was moderator deleted at 2011-11-11 11:46:17

------ For reference only ---------------------------------- -----
has anyone used ah
------ For reference only ------------------------ ---------------

This already shows
jobtracker not show

Beginner hadoop, there is a small doubt

I looked hadoop time, saw the HDFS, have a question , why HDFS is a Namenode more Datanode, or that there is only one master?
------ Solution ---------------------------------------- ----
1. only one namenode, no namenode communication , fast !
2.namenode only keep a little information, you can put all that information on the memory, and query up fast !
3. because namenode only keep a small number of the most critical information , rather than the actual data stored , so sufficient !
4. Even namenode crashes, there are secondary namenode can recover
client to read and write files in a directory to the master request ; master find memory returned to it the address of the file where the datanode ; client interaction talk datanode go ! That would save the master network bandwidth , both in memory and its efficiency is also high
------ Solution ----------------------- ---------------------
example, one or more namenode namenode like our leadership mechanism , a centralized leadership and efficient ! 2 leadership stability , fault tolerance better !

2013年8月18日星期日

hadoop beginner

I just looked at the online documentation to build a distributed cluster , then to learn how hadoop it ?
------ Solution ---------------------------------------- ----
strongly recommended hadoop learning essential classic book :
hadoop Definitive Guide Second Edition
hdfs build a good , you can see its principles, can be familiar with its java interface is written in java program to read the file
Then MAPREDUCE, structures, see its principles , familiar with its interface , write simple java program simple functions such as finding files in characters and their frequency
Then HBASE, see its principles, familiar interfaces , java implementation of the new basic delete tables, insert query to delete records
above three principles can be part of the three reference google cloud computing Bible
GFS, MAPREDUCE, BIGTABLE
------ For reference only ----------------------------- ----------
I have built over so you can directly learn Map / Reduce

------ For reference only ---------------------------------------

API directly Baidu code to learn it ? ?
------ For reference only -------------------------------------- -
curious about
------ For reference only -------------------------------- -------
hadoop support vhd do
------ For reference only ------------------------ ---------------

can you send me that way ? zyn-xiao3@163.com

hadoop in c + + program how to compile !

My hadoop installation directory is :/ usr/src/hadoop-0.20.2 /

how to compile hadoop comes wordcount_sample.cc program ?
------ Solution ---------------------------------------- ----
compiled statement wrong

hadoop fs-put error

start-all starts normally, turn off the firewall, safemode also been closed, the pseudo fractions windows cygwin environment, in-put operation will still report the following error, experts advise yourselves, already tired one afternoon, fourth year hadoop, I have really no way out.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File / user/dz64/input4/wordcount.txt cou
ld only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java:
1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock (NameNode.java: 422)
at sun.reflect.GeneratedMethodAccessor9.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java: 508)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 959)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 955)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java: 953)

at org.apache.hadoop.ipc.Client.call (Client.java: 740)
at org.apache.hadoop.ipc.RPC $ Invoker.invoke (RPC.java: 220)
at $ Proxy0.addBlock (Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.jav
a: 82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java: 59)
at $ Proxy0.addBlock (Unknown Source)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.locateFollowingBlock (DFSClient.java: 2937
)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.nextBlockOutputStream (DFSClient.java: 281
9)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.access $ 2000 (DFSClient.java: 2102)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream $ DataStreamer.run (DFSClient.java: 2288)
--- --- Solution --------------------------------------------
PS you first look to see is not all processes are started. . .
I guess you datanode process did not start successfully. .

then re-format

re-format, pay attention to first remove the original format generated data
------ Solution ---------------------- ----------------------
look datanode the logs there will be errors, datanode did not start successfully

2013年8月17日星期六

Can Hadoop as a service for real-time processing

This post last edited by the chaijunkun on 2012-03-16 11:24:18

I might have more problems out. I can say right hadoop also ill. Only know hadoop for distributed processing of large data frame.
But now there is a question:

hadoop many of which are used in log processing. But the logs are kept dynamically generated. As map-reduce application, hadoop can achieve real-time log analysis and processing? Or that can handle the content will not change the log (for example, yesterday, last month, etc.)? I hope Daniel have to give answers. If my description is not very clear, I hope large cattle raised, I would like to add.
------ Solution ---------------------------------------- ----
I feel can not, from the outset not to emphasize Hadoop reaction time, he stressed that the high throughput.
------ Solution ---------------------------------------- ----
each start hadoop, hadoop log files will be combined with the data to be updated, you may also combine secondarynamenode. And each of the operation is not immediately update the data. Seems to be the case, I saw in the book, concrete do not remember clearly. Not in my book, sorry. . . This answer is for reference only
------ Solution ----------------------------------- ---------
landlord good hit hadoop Hadoop is for the key batch job characteristics data throughput but once a job can no longer be changed sometimes start a job may run for several hours.
You say the problem is actually real MapReduce problem, the focus of the study is one of hadoop landlord can refer to the twitter Storm framework specifically for large real-time data streams
writing is not easy to find a sub ~
------ Solution --------------------------- -----------------
Hadoop can not do real-time processing, the reaction time is too slow, and if the application system used to do a dead man.
------ For reference only -------------------------------------- -
do not know. passing. learning.
------ For reference only --------------------------- ------------
thanks "maxim_sin" answer. Indeed as high throughput computing framework, if the request is still very difficult real-time. You say I am inclined to agree with this point "Sometimes a job may run for a few hours." After starting the task, their work may not finish it, how to deal with real-time data obtained? Some people might say: "You can start the project when it introduced a Hadoop ah, so that you can live up." But do not forget, there are limits on the amount of data it. If your Hadoop cluster only a few machines, but the data is too much, resulting in not real-time processing is completed, the data will backlog piled up, and ultimately never piling up. To avoid this problem, simply let it handle a limited amount of data. Other users depending on the case also gave answer scores knot stickers.

Distributed parallel computer design

1 (75 points) for future high-performance computing needs, given 100Pflops computer design program requirements:
(1) shall include at least the architecture, the core hardware, job scheduling system, system management software, part of the detailed design
(2) shall be given an energy-efficient energy-saving design
(questions 1 answer)
2. (25 points) for a subset of high-performance clusters College Node601-610 (a total of 10 nodes, it can be 610-676 in any 10 nodes), give a specific Linpack test program and test results.
(refer http://csgrid.whu.edu.cn the system description)
(questions 2 answers)

Who does not?

------ Solution ------------------------------------ --------
This should not face the questions right, this is not the bar exams.
This is the paper title bar. . .
------ Solution ---------------------------------------- ----
first question booing, said the key is the second title, the school's ssh connection between the machine has been broken, from a login node simply no way to deploy testing program. And that did not get the root account.
------ Solution ---------------------------------------- ----
same teacher too hard
------ Solution --------------------------- -----------------
really can not afford Sang ah. . .
------ Solution ---------------------------------------- ----
seeking reference ~
------ For reference only ---------------------------- -----------
we distributed parallel computing final exams. Too challenging, the key research in this direction I do not.
------ For reference only -------------------------------------- -

The reply on 2011-12-12 16:33:06 deleted by moderator

------ For reference only ---------------------------------- -----

This reply was moderator deleted at 2012-01-05 16:41:59

------ For reference only ---------------------------------- -----

This reply was moderator deleted at 2012-01-05 16:42:08

------ For reference only ---------------------------------- -----

This reply was moderator deleted at 2012-01-05 16:42:08

------ For reference only ---------------------------------- -----
has been completed!

About hadoop performance testing issue

I set up a pseudo-distributed system consisting of three machines were hdp0 hdp1 hdp1 which hdp0 as the other two as namenode datanode. When formatting and build a normal startup , but why do I use TestDFSIO test run to the following so that cardholders do not move , seeking guidance thanks
hadoop @ hdp0: ~ / hadoop-0.20.203.0 $ bin / hadoop jar hadoop-test-0.20.203.0.jar TestDFSIO-read-nrFiles 10-fileSize 100
TestDFSIO.0.0.4
12/03/27 00:26:05 INFO fs.TestDFSIO: nrFiles = 10
12/03/27 00:26:05 INFO fs.TestDFSIO: fileSize (MB) = 100
12/03/27 00:26:05 INFO fs.TestDFSIO: bufferSize = 1000000
12/03/27 00:26:06 INFO fs.TestDFSIO: creating control file: 100 mega bytes, 10 files
12/03/27 00:26:07 INFO fs.TestDFSIO: created control files for: 10 files
------ Solution --- -----------------------------------------
landlord of the problem solved it ? I also met the same problem do not know why uh
------ Solution --------------------------- -----------------
uh. . Well, I suddenly resolved. . . Just because it is relatively slow , the landlord has been placed not to move , much can wait out
------ Solution ------------------- -------------------------
but I do not really understand it I use the sentence is $ bin / hadoop jar ; hadoop-test-0.20.203.0.jar TestDFSIO-write-nrFiles 10-fileSize 1000

[Jiya ~] hadoop capacity-scheduler queue scheduling problem, thankyou for a favor friends ~

This post last edited by the zhang__bing on 2013-08-01 11:28:49

If equipped with three queues A, B, C, resource allocation of 30%, 40%, 30%
then submit to the B two job (1,2), job1 requires resources 40%, job2 need 20%, then the capacity allocation queue job1 back is assigned to B, job2 will be assigned to A.

I think, if the resource is not enough, a second job should be out of the blocked state, waiting for job1 finished running and then run it

capacity queues can be configured to strictly limit the resources allocated?

Or hadoop there a queue scheduling can meet the strict allocation of resources is now the effect of it?

Jijiji acridine ~

help out friends ~

------ Solution ------------------------------------ --------
Capacity Scheduler used in each queue is a FIFO scheduling strategy algorithms.
Capacity Scheduler does not support default priority, but you can enable this option in the configuration file, if supported priority scheduling algorithm is with priority FIFO.
Capacity Scheduler does not support preemptive priority, once a job begins execution completes execution before its resources are not preempted by higher priority jobs.
Capacity Scheduler for queue jobs submitted by the same user can obtain a limit on the percentage of the resources that belong to a user's job can not appear monopolize resources.

reconfigured according to your needs under the 2,3,4 key, and then
configuration is complete, restart the jobtracker can.
stop-mapred.sh
start-mapred.sh
note when submitting the Job, remember to set job.set (), specify the group or Pool;

------ For reference only ---------------------------------- -----
See: http://blog.csdn.net/jiedushi/article/details/7920455
------ For reference only ---------- -----------------------------
Although your answer is no fundamental solution to my problem, but thank you la ~

Advantages compared to the traditional cluster HADOOP

New self HADOOP, for their own understanding of this problem is that we work together HADOOP complete a task, while traditional clusters are receiving a task, then points to a machine in the cluster, by his single task.
not sure I understand it right?
------ Solution ---------------------------------------- ----
discuss this question, we should first understand what a hadoop cluster, a literature:
Hadoop cluster is designed for storing and analyzing vast amounts of unstructured data and design a specific type of cluster. Essentially, it is a computing cluster, data analysis will be assigned to work on multiple cluster nodes to process data in parallel.
From this point on, the landlord understood sense, then look advantages:
one,
suitable for big data processing. Big Data are generally widely distributed and is unstructured. The Hadoop is ideal for this kind of data is because, Hadoop works is that the data is split into pieces, and each "slice" is assigned to a specific cluster nodes for analysis. Data do not uniformly distributed, since each data slice is in a separate cluster nodes individually processed.
two,
flexible scalability. And any other type of data, the large data analysis is an important problem facing the increasing volume of data. And the biggest advantage of big data in real-time or near real-time analysis and processing. The Hadoop clusters can increase the parallel processing capability of the speed, but with the amount of data to be analyzed increases, the processing capacity of the cluster may be affected. However, by adding additional cluster node cluster can be effectively extended.
three,
cost. Hadoop cluster cheaper for two main reasons. It required software is open source, so that you can reduce costs. In fact, you can freely download the Apache Hadoop release. Meanwhile, Hadoop clusters by supporting commercial hardware control costs. Do not have to buy a server-class hardware, you can build a powerful Hadoop cluster. So, it turns out, Hadoop cluster is indeed a cost-effective solution.
four,
fault tolerance capabilities. When a data slice is sent to a node for analysis, the data on the other nodes in the cluster will have a copy. In this way, even if one node fails, the node data is still present in extra copies of the rest of the cluster, so that data can still be processed for analysis.

------ For reference only ---------------------------------- -----
Hadoop is an outstanding advantage in fact, is the inadequacy of traditional BI it. Hadoop is not perfect of course, there are some drawbacks: A disadvantage
cluster solution is based on the data "separable" and a node on a separate basis of parallel processing. If you do not meet on the analysis of parallel processing environment, then Hadoop cluster is not an appropriate tool to accomplish this task.
Another is the amount of data is small, configure, build, operation and maintenance and support of a bit does not pay

------ For reference only ---------------------------------- -----
this article is very good, comprehensive and in-depth can have a grasp on hadoop:
http://os.51cto.com/art/201211/364374.htm

2013年8月16日星期五

Why hadoop map quickly , reduce card in ten minutes ?

Why is this ?
------ Solution ---------------------------------------- ----

If it is data skew , rewritten Partitioner better ~ , data skew in practice is common.
You can see on the website is not being replicated data, if that is the case , it may be a problem of the machine or network problems ~
------ Solution -------- ------------------------------------
data may be inclined , consider a . construct the corresponding combiner, 2. rewrite partitioner. course, if you upload files in to hdfs may also cause reduce very slow
------ Solution -------- ------------------------------------
reduce stuck there after the last successful execution of it ? May reduce the code wrong. . . You to do is to map and reduce what action ?
------ Solution ---------------------------------------- ----
1. data skew , rewritten partitioner,
2. code is wrong. To jobtrack.jsp? Jobid see the progress of implementation , see the log
------ For reference only -------------------------- -------------
is data skew it. . . reduce process polymerization process , if a certain type of data will result in a disproportionately large tilt, this part of the data will be calculated at a reduce , affect the overall performance
------ For reference only ----- ----------------------------------

that this question should conform to solve it , and if My data is randomly generated, I should not re- generate the data better

Hadoop / Linux Super Starter Neighborhoods ! (Linux install Hadoop)

Basically never used Linux, only recently started watching Hadoop, Linux, or you seek God brought me a large Hadoop entry !

The Linux (Ubuntu) installed on a VMware virtual machine , the most simple installation , you can access . Hadoop saw its official website to download the version , the latest stable version is 1.0 , but there a lot of files , ah, there . Rpm,. Rpm.asc,. Rpm.mds,. Tar.gz,. Deb,. Deb.asc,. Deb . mds , etc. , this is what to do with ah ? What should be the next down ? Next down after how sub-directory , how to install ?

neighborhoods up !

------ Solution ------------------------------------ --------
download one on the line , some installation files , some compressed files.

hadoop startup error, the Internet is really finding out why. . . ..

chaiying0 @ ubuntu: ~ $ / home/hadoop/hadoop-0.20.2/bin/start-all.sh
starting namenode, logging to / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-namenode-ubuntu.out
/ home/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: line 117: / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop -chaiying0-namenode-ubuntu.out: Permission denied
head: cannot open `/ home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-namenode-ubuntu.out 'for reading: No ; such file or directory
localhost: starting datanode, logging to / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-datanode-ubuntu.out
localhost: / home/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: line 117: / home/hadoop/hadoop-0.20.2/bin / .. / logs/hadoop-chaiying0-datanode-ubuntu.out: Permission denied
localhost: head: cannot open `/ home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-datanode-ubuntu.out 'for reading : No such file or directory
localhost: starting secondarynamenode, logging to / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-secondarynamenode-ubuntu.out
localhost: / home/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: line 117: / home/hadoop/hadoop-0.20.2/bin / .. / logs/hadoop-chaiying0-secondarynamenode-ubuntu.out: Permission denied
localhost: head: cannot open `/ home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-secondarynamenode-ubuntu.out 'for reading : No such file or directory
starting jobtracker, logging to / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-jobtracker-ubuntu.out
/ home/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: line 117: / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop -chaiying0-jobtracker-ubuntu.out: Permission denied
head: cannot open `/ home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-jobtracker-ubuntu.out 'for reading: No ; such file or directory
localhost: starting tasktracker, logging to / home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-tasktracker-ubuntu.out
localhost: / home/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: line 117: / home/hadoop/hadoop-0.20.2/bin / .. / logs/hadoop-chaiying0-tasktracker-ubuntu.out: Permission denied
localhost: head: cannot open `/ home/hadoop/hadoop-0.20.2/bin/../logs/hadoop-chaiying0-tasktracker-ubuntu.out 'for reading : No such file or directory

I feel there are two errors, one is permission denied, the other is no such file or directory. May I ask how to solve ah you greatly
------ Solution ---------------------------------- ----------
SSH configured it?
------ Solution ---------------------------------------- ----
should be a user problem, you say chaiying0 switch to hadoop users try, you chaiying0 users access files under hadoop is certainly no permission
------ For reference only - -------------------------------------
anxious ah. . . Manga you greatly. . .
------ For reference only -------------------------------------- -
Big Brother Big Sister Come!
------ For reference only -------------------------------------- -
You can try to stop the service.
then hadoop namenode-format
grid delete tmp before all of the following documents (including data nodes).
------ For reference only -------------------------------------- -
datatmp directories are set about with nametmp Do not use the default.
------ For reference only ------------------- --------------------
this question is very clear. Is your current user does not have to open the logs hadoop file. You can use the ls-al look, logs file permissions should not belong to the current user. so you need to use sudo chown chaiying0: chaiying0 logs the current user chaiying0 command to confer privileges.

HDFS and GPFS, CSM and xCAT difference ?

1, HDFS and GPFS What is the difference ? After a post- how to build GPFS deployment HDFS?
2, CSM and xCAT What is the difference ? xCAT is not to build on the CSM ?
------ Solution ---------------------------------------- ----
1. distributed File System Hdfs (Hadoop Distributed File System)
parallel file system cluster GPFS (General Parallel File System)
HDFS using the master / slave architecture. An HDFS cluster is composed of a certain number of Datanodes Namenode and composition .
2. This is not clear .
------ For reference only -------------------------------------- -
learned
------ For reference only ------------------------------ ---------

This reply was moderator deleted at 2011-07-26 13:14:42

------ For reference only ---------------------------------- -----
how to understand the master / slave architecture with Namenode / Datanode architecture ? Whether on a cluster system to explain ? Thank
------ For reference only ------------------------------------- -
distributed File System Hdfs (Hadoop Distributed File System)
parallel file system cluster GPFS (General Parallel File System)

learn about
------ For reference only ---------------------------- -----------

This reply was moderator deleted at 2011-07-30 10:22:55

------ For reference only ---------------------------------- -----
learn about

windows clients access hdfs

I was a rookie, the latest learning hadoop, hdfs want to build a large file-based storage system. My idea is this, the relationship between a large file data is stored in oracle, while there will be a large file itself on hdfs. Data Access client under windows vs2010 development in windows xp and win7 use.
I am using centos build hadoop1.2.0 distributed environment, the data nodes can access via the command line, but I think windows xp-based client access hdfs file system, windows local upload, download, delete the data in hdfs.
Is there such a hdfs client installations in the windows system, you can provide an interface through hdfs hadoop communicate with, and I did not find specific things. I know there are c + + for libhdfs, but do not know how this thing developed under the windows platform, it is not clear can not be developed under vs2010.
check a lot of information is also not thoroughly understand in what way to achieve this effect. Also please adequate guidance twelve, give me over to help.
------ Solution ---------------------------------------- ----
eclipse
------ For reference only ------------------------------ ---------
experts to give pointers to help me analyze it and see ideas have no problem that can not be achieved
------ For reference only ------ ---------------------------------

can say specifically what, how eclipse will be able to access hdfs
can provide specific information, Thank you very much.
------ For reference only -------------------------------------- -
In addition, I saw a post
http://www.front2end.cn/blog/access-Hadoop-with-C% 23.html
has said that the thrift can provide windows and hdfs access interface, do not know what this is feasible
------ For reference only --------------- ------------------------
google search a lot.
------ For reference only -------------------------------------- -

though, and I ended up not using the same programs, but only you can answer my question, divided gave you

Hadoop itself is what language development?

Hadoop itself is what language development?
OS can run on top of that?
its subprojects HDFS is what language development?
------ Solution ---------------------------------------- ----
implemented in Java, open source, supports Fedora, Ubuntu and other Linux platforms!
------ Solution ---------------------------------------- ----
Hadoop itself is what language development? ---- Java language
OS can run on top of that?
• GNU / Linux is a product development and runtime platform. Hadoop already has 2000 nodes GNU / Linux host cluster system consisting verified.
• Win32 platform is supported as a development platform. As a distributed operation has not been fully tested on Win32 platforms, so it was not as a production platform is supported.

its subprojects HDFS is what language development? ---- Java
------ For reference only ----------------------------- ----------
I also want to have the Windows version of Hadoop, unfortunately not yet.
------ For reference only -------------------------------------- -
hadoop on the application is java, but not the java hadoop itself looks like it. . . . .
Another question did not answer it?
------ For reference only -------------------------------------- -
windows hadoop also supported on the
------ For reference only --------------------------- ------------
currently no direct on hadoop version for windows

that use cygwin
------ For reference only ---------------------------- -----------
and is said unstable
------ For reference only -------------------- -------------------
above did not answer my question. . . .
find reliable answers
------ For reference only ---------------------------------------
thanks ~ your answer with my own found in the same ~
If you can answer my earlier save some power, ha ha
scores give you all ~

------ For reference only ---------------------------------- -----
Results posted the second floor, it was found also said, but not so detailed
------ For reference only -------------- -------------------------

This reply was moderator deleted at 2012-06-23 15:37:28

2013年8月15日星期四

Run eclipse hadoop account in error

This post last edited by the citylove on 2013-08-05 11:08:00

my ubuntu there are two accounts, one is an ordinary user, one is to install hadoop recently created user.
The problem now is that I run in the general account eclipse no problem at all, but in the hadoop account run on the error.
I follow this blog installed hadoop: http://www.cnblogs.com/tippoint/archive/2012/10/23/2735532.html

error display:
$ eclipse &
[1] 28486
hadoop @ city-ubuntu :/ home / city $ No protocol specified
No protocol specified

(eclipse: 28487): GLib-GObject-WARNING **: invalid (NULL) pointer instance

(eclipse: 28487): GLib-GObject-CRITICAL **: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance) 'failed

** (eclipse: 28487): WARNING **: The connection is closed

(eclipse: 28487): Gtk-CRITICAL **: IA__gtk_settings_get_for_screen: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gtk-WARNING **: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window

(eclipse: 28487): Gtk-WARNING **: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_display: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_display_get_pointer: assertion `GDK_IS_DISPLAY (display) 'failed

(eclipse: 28487): Gtk-WARNING **: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_n_monitors: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gtk-WARNING **: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_monitor_geometry: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen) 'failed

(eclipse: 28487): Gdk-CRITICAL **: IA__gdk_window_new: assertion `GDK_IS_WINDOW (parent) 'failed
Segmentation fault (core dumped)

[1] + Exit 139 eclipse

I was told to use xhost + ordinary users to try, results are shown below:
hadoop @ city-ubuntu :/ home / city $ xhost + city
No protocol specified
No protocol specified
xhost: unable to open display ": 0.0"

seek expert advice, be grateful

PS: Currently I use xhost + localhost, and then in the general account under ssh-X hadoop @ localhost eclipse eclipse can run, but I installed hadoop plugin did not play a role. Under ordinary users can see mapreduce options that are now under the hadoop do not see, and has the authority to plug 777.
------ Solution ---------------------------------------- ----
that we should pay attention when installing Eclipse! I was also so your eclipse I do not know where you installed, I suggest you reinstall it. This must be reasonable when installed in / home under construction in an eclipse folder, and then manually install themselves on this below, if you will be free to install a bunch of questions, time for you to solve these problems not as heavy equipment. I guess you might be installed in an ordinary user's directory, but the user is not authorized to access hadoop you that the user's home directory. You have installed in / home bottom, so that all users can use eclipse, every time I use are cd / home + sudo. / Eclipse / eclipse. You look at is not the case.
------ For reference only -------------------------------------- -
you get these two articles from the following ideas to solve it, a similar problem in a similar way, are generally effective
1, http://blog.sina.com.cn/s/blog_7195429b0100qxl5.html
2, http://blog.csdn.net/muzizhuben/article/details/6735795
------ For reference only --------------- ------------------------

your approach really solved my previous problem, but now there is a new problem!
After installing the plug-in, is not displayed on the left sidebar DFD Locations, other touches everything is normal.
seek advice
------ For reference only -------------------------------- -------

I could not confirm whether the eclipse platform I run hadoop program
------ For reference only --------- ------------------------------

I just created a mapreduce project, then dfs location ; out on his own forehead. . . Toss hungry for a long time ah! !

hadoop cloud disk system (web version)

At a recent hadoop cloud disk system, is the use of java API hdfs hdfs write the file in the file system.
now has been achieved in the java file file upload, download, delete, and view, but the program writes web client (written jsp, bean, servlet in) encountered some problems, I do not know what issues should pay attention to, Please help out.
------ Solution ---------------------------------------- ----
There are many issues that need attention
such as Web load balancing, access high concurrency. Some problems may be unrelated and HADOOP
LZ should encounter specific issues write, analyze specific issues

------ Solution ------------------------------------ --------
Configuration is not set up in fs.default.name

tomcat start at what has gone wrong?
------ For reference only -------------------------------------- -

each file code:
upload.jsp:
<% @ page language = "java" contentType = "text / html; charset = GB18030"
pageEncoding = "GB18030"%>

File Upload

HDFSOperation.java:

package servlet;

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HDFSOperation {
private Configuration conf;
private FileSystem fs;
public HDFSOperation () throws IOException {
conf = new Configuration ();
fs = FileSystem.get (conf);
}
public boolean upLoad (String filePath) {
try {
InputStream in = new BufferedInputStream (new FileInputStream (filePath));
fs = FileSystem.get (URI.create ("hdfs :/ / localhost: 9000/text"), conf);
OutputStream out = fs.create (new Path ("hdfs :/ / localhost: 9000/text"));
IOUtils.copyBytes (in, out, 4096, true);
} catch (IOException e) {
e.printStackTrace ();
return false;
}
return true;
}}

uploadServlet.java:
package servlet;

import java.io.IOException;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

/ **
* Servlet implementation class uploadServlet
* /
@ WebServlet ("/ uploadServlet")
public class uploadServlet extends HttpServlet {
private static final long serialVersionUID = 1L;

/ **
* @ see HttpServlet # doPost (HttpServletRequest request, HttpServletResponse response)
* /
protected void doPost (HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filePath = request.getParameter ("filePath");
HDFSOperation hdfsOperation = new HDFSOperation ();
boolean flag = hdfsOperation.upLoad (filePath);
if (flag) {
response.sendRedirect ("success.jsp");
} else {
response.sendRedirect ("failure.jsp");
}
}

}

web.xml:

huwellweb

uploadServlet

servlet.uploadServlet

uploadServlet

/ uploadServlet

index.jsp

------ For reference only --------------------------- ------------

upload.jsp at run time, tomcat will not start.
help look at the code, there must be a lot wrong with it.
------ For reference only -------------------------------------- -
please refer upstairs, set fs.default.name
------ For reference only ----------------------- ----------------

downloaded works in another set fs.default.name, and introduced in the original project into their own jar package hadoop jar in the package, and then just fine, thank you help.
------ For reference only -------------------------------------- -

downloaded works in another set fs.default.name, and the introduction of the original project into a jar in his hadoop jar package, and then just fine, thank you help.
------ For reference only -------------------------------------- -
me that a lot of pressure novice Yeah, worship ing

hadoop run wordcount no response

In a virtual machine using 3 linux systems configured hadoop installation , start start-all.sh
error
2011-07-02 05:22:20,702 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs :/ / m: 9000/home / hadoop / tmp / mapred / system
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete / home / hadoop / tmp / mapred / system. Name ; node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.

shut down the primary node [hadoop @ m_184 ~] hadoop dfsadmin-safemode leave
Show Safe mode is OFF
which
/ wordcount / input / wordcount / output
are using hadoop dfs-mkdir to create a directory in HDFS , and upload the file
now run WordCount no response,
[hadoop @ m_184 ~] hadoop jar WordCount.jar WordCount / wordcount / input / wordcount / output
Why is this ?

Thank you !
------ Solution ---------------------------------------- ----
Name node is in safe mode.
lift safe mode
------ For reference only ------------------------------- --------
can do ? You can communicate with me , my qq109492927

namenode how to keep the latest file directory tree structure

namenode will modify the records stored in editlog , the directory tree stored in fsimage in , secondnamenode editlog every update to the fsimage the 3600s , the problem came ?

why we can see through to the latest namenode directory tree structure? ( rather secondnamenode update at least one hour delay )?
------ Solution ------- -------------------------------------
namenode only loaded at boot time fsimage, and replay editlog, after all modifications are accessing the metadata in memory to modify , and to manipulate recorded editlog in
secondarynamenode one for cold standby , the second is to reduce namenode restart replay editlog next time ,
This is no direct relationship between the two

2013年8月14日星期三

WordCount Hadoop running on the issue! ! ! !

hadoop jar wordcount.jar org.myorg.WordCount. / wordcount / input /. / wordcount / ouput
run WordCount when the console does not have any information output, has been running.

I also closed hadoop safe mode, but after running or has been "stuck"

solved!
------ Solution ---------------------------------------- ----
configuration file error, leading to datanode process can not start
------ For reference only ---------------------- -----------------
to add a question: When I run hadoop fs-put input1.txt input: the following error

11/10/12 16:06:29 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File / user/supertool/input/input1.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java: 1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock (NameNode.java: 422)
at sun.reflect.GeneratedMethodAccessor7.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java: 508)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 959)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 955)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java: 953)

at org.apache.hadoop.ipc.Client.call (Client.java: 740)
at org.apache.hadoop.ipc.RPC $ Invoker.invoke (RPC.java: 220)
at $ Proxy0.addBlock (Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java: 82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java: 59)
at $ Proxy0.addBlock (Unknown Source)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.locateFollowingBlock (DFSClient.java: 2937)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.nextBlockOutputStream (DFSClient.java: 2819)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.access $ 2000 (DFSClient.java: 2102)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream $ DataStreamer.run (DFSClient.java: 2288)

11/10/12 16:06:29 WARN hdfs.DFSClient: Error Recovery for block null bad datanode [0] nodes == null
11/10/12 16:06:29 WARN hdfs.DFSClient: Could not get block locations. Source file "/ user/supertool/input/input1 . txt "- Aborting ...
put: java.io.IOException: File / user/supertool/input/input1.txt could only be replicated to 0 nodes, instead of 1
11/10/12 16:06:29 ERROR hdfs.DFSClient: Exception closing file / user/supertool/input/input1.txt: org.apache.hadoop. ipc.RemoteException: java.io.IOException: File / user/supertool/input/input1.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java: 1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock (NameNode.java: 422)
at sun.reflect.GeneratedMethodAccessor7.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java: 508)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 959)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 955)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java: 953)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File / user/supertool/input/input1.txt could only be replicated to 0 nodes , instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java: 1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock (NameNode.java: 422)
at sun.reflect.GeneratedMethodAccessor7.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.ipc.RPC $ Server.call (RPC.java: 508)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 959)
at org.apache.hadoop.ipc.Server $ Handler $ 1.run (Server.java: 955)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.ipc.Server $ Handler.run (Server.java: 953)

at org.apache.hadoop.ipc.Client.call (Client.java: 740)
at org.apache.hadoop.ipc.RPC $ Invoker.invoke (RPC.java: 220)
at $ Proxy0.addBlock (Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod (RetryInvocationHandler.java: 82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke (RetryInvocationHandler.java: 59)
at $ Proxy0.addBlock (Unknown Source)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.locateFollowingBlock (DFSClient.java: 2937)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.nextBlockOutputStream (DFSClient.java: 2819)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream.access $ 2000 (DFSClient.java: 2102)
at org.apache.hadoop.hdfs.DFSClient $ DFSOutputStream $ DataStreamer.run (DFSClient.java: 2288)

------ For reference only ---------------------------------- -----
can say how wrong configuration files under what I also met the same problem

IOCP disorder problems .

Recent study IOCP. Examples http://blog.csdn.net/piggyxp/article/details/6922277 reference PiggyXP
his turned into C + + code of Delphi .
but in the service end point problem occurs when sending data . ( To test data integrity ) client makes a request , the service sends a file to the client , the client receives the file size is correct, but the file is not correct , send compressed package open.
I use when sending wsaSend. sending files about 100K , 8192 bytes each delivery , return and then continue to drop in PostQueuedCompletionStatus 8192 bytes until the transmission is completed . Uses a short connection , each connection is disconnected after the completion of an interaction .
------ Solution ---------------------------------------- ----

- ----- For reference only ---------------------------------------
In addition , GetQueuedCompletionStatus anti-back success, and treatment length equal to the length and the transmission is not completed successfully received on behalf of the other end of it ?
------ For reference only -------------------------------------- -
who access points . Who will pick points.

Stand-alone installation on linux hadoop Rom 50030 page faultsoccur , seeking god

mapred-site configuration file is:

mapred.job.tracker
localhost: 9001

View 50030
with a web page link Denied error occurs when
------ Solution ------------------------- -------------------
the firewall off.
the / etc / hosts file all commented , leaving only a
localhost 127.0.0.1
Try
------ For reference only -------------------------------- -------

50070 pages can open and view information
------ For reference only --------------------------- ------------
solved, but they run a start-mapred.sh

hadoop jobtracker

jps View jobtracker has started, but visit Http :/ / jobtracker: 50030web page has been waiting for request-response , the page can not be displayed , consult the brightest Daniel help me.
------ Solution ---------------------------------------- ----
replaced ip visit to try http://ip:50030
------ Solution ------------------ --------------------------
determine jobtracker and task has already started ? Configuring no problem ? As well as in local win the next hosts, the IP address to identify the
------ Solution --------------------------- -----------------
distributed or pseudo-distributed deployment ? Note Jobtracker address is which machine
------ For reference only ------------------------------- --------
does not work yet. . . . , Tried, and the port is close wait state , what causes it , please enlighten ~ ~
------ For reference only -------------- -------------------------
does not work yet. . . . , Tried, and the port is close wait state , what causes it , please enlighten ~ ~
------ For reference only -------------- -------------------------
facie jobtracer log
------ For reference only ----- ----------------------------------
log since there is no open log
- ----- For reference only ---------------------------------------

time inconsistency problem lies in the system , pit ah. . . .
------ For reference only -------------------------------------- -

time inconsistency problem lies in the system , pit ah. . . .

2013年8月13日星期二

Installed before hadoop suddenly can not run a

Because of my mistake .. careless use of the root user start-all ( originally created by user-initiated ) .
So then has not started , then I intend to put data cleared away , and then format it again , but still no change .
I first delete all the files under hadoop.tmp.dir then under / tmp PID of all hadoop deleted.
final format again.
Ask the experts, help me to support chi move.
error log file can not be found in class


 2012-12-28 15:54:16 org.apache.hadoop.mapred.JobTracker main
 严重: java.lang.NoClassDefFoundError: org/apache/log4j/AppenderSkeleton
 	at java.lang.ClassLoader.defineClass1(Native Method)
 	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 	at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 	at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 	at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 	at java.security.AccessController.doPrivileged(Native Method)
 	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 	at java.lang.ClassLoader.defineClass1(Native Method)
 	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
 	at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 	at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 	at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 	at java.security.AccessController.doPrivileged(Native Method)
 	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 	at org.apache.hadoop.metrics2.source.JvmMetricsSource.getEventCounters(JvmMetricsSource.java:185)
 	at org.apache.hadoop.metrics2.source.JvmMetricsSource.getMetrics(JvmMetricsSource.java:119)
 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:188)
 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:166)
 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:145)
 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:321)
 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:307)
 	at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
 	at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:56)
 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:89)
 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:210)
 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:181)
 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.register(DefaultMetricsSystem.java:55)
 	at org.apache.hadoop.metrics2.source.JvmMetricsSource.create(JvmMetricsSource.java:102)
 	at org.apache.hadoop.metrics2.source.JvmMetricsSource.create(JvmMetricsSource.java:107)
 	at org.apache.hadoop.mapred.JobTrackerMetricsSource.<init>(JobTrackerMetricsSource.java:106)
 	at org.apache.hadoop.mapred.JobTrackerInstrumentation.create(JobTrackerInstrumentation.java:185)
 	at org.apache.hadoop.mapred.JobTrackerInstrumentation.create(JobTrackerInstrumentation.java:180)
 	at org.apache.hadoop.mapred.JobTracker.createInstrumentation(JobTracker.java:1683)
 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2207)
 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1888)
 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1882)
 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:311)
 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:302)
 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:297)
 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4788)
 Caused by: java.lang.ClassNotFoundException: org.apache.log4j.AppenderSkeleton
 	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 	at java.security.AccessController.doPrivileged(Native Method)
 	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 	... 50 more

------ Solution ------------------------------------- -------
log files are also deleted
------ For reference only ----------------------- ----------------
I think this class is not found .. look inside log4j classes , I went to the lib folder , find the log4j package is not . .. add yourself up like a

Hadoop: hadoop fs-ls file storage location

hadoop fs-ls
I execute the above command ,
show the following results
Found 1 items
drwxr-xr-x guoxiang supergroup 0 2013-03-13 14:43 / user / myhadoop / testdir

read out file location / user / myhadoop / testdir, would like to ask the file system in linux which location ? In linux system using cd can find to do ?

I create a file when using a

hadoop fs-mkdir testdir

The default path is what ah ?

I'm just learning Hadoop, understanding of these commands is not very thorough , seeking help, very grateful
------ Solution ----------------- ---------------------------
If you do not set the path , then, is on the datanode / tmp/hadoop- ( user name ) / data this directory
------ For reference only --------------------------- ------------

Thank you, but I looked at
I / tmp directory has this
hadoop-myhadoop-datanode.pid
hadoop-myhadoop-jobtracker.pid
hadoop-myhadoop-namenode.pid
hadoop-myhadoop-secondarynamenode.pid
hadoop-myhadoop-tasktracker.pid
What are these documents ? ?
There Jetty_0_0_0_050030_job___yn7qmk
Jetty_0_0_0_050060_task___.2vcltf
Jetty_0_0_0_050075_datanode___hwtdwq
Jetty_0_0_0_050070_hdfs___w2cu88
Jetty_0_0_0_050090_secondary___y6aanv
these folders , but did not see you say / tmp/hadoop- ( user name ) / data directory ,
And Jetty_0_0_0_050075_datanode___hwtdwq below nor I know the file , you can then help me explain it ? Thank you, ah
------ For reference only ------------------------------------ ---

he kept the file block , rather than a one file, so you can not see

Map / reduce error Error in configuring object

Brother new to Map / reduce programming and want to read and write through dbinputformat Oracle database, but there has been an error Error in configuring object.

But having hadoop0.20.0 of dbinputformat not to support Oracle's ah? I would urge large cattle wing. .

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Iterator;
import java.util.Random;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.LongSumReducer;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;
import org.apache.hadoop.util.StringUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.hsqldb.Server;

public class WordCountDB {

private static final String DRIVER_CLASS = "com.oracle.jdbc.Driver"; / / database engine
private static final String DB_URL = "jdbc: oracle :/ / 127.0.0.1:1521 / BUS"; / / address of the database
private static final String DB_USER = "manage_bus"; / / database name
private static final String DB_PASSWD = "its312"; / / Password

private static final String [] FieldNames = {"name", "age"};

private static Connection connection; / / database connection
public static boolean initialized = false; / / determine database connection

public static class TokenizerMapper extends MapReduceBase implements
Mapper {

@ Override
public void map (LongWritable key, TeacherRecord value, OutputCollector output, Reporter reporter)
throws IOException
{
System.out.println ("enter the map function");
output.collect (new Text (value.name), new DoubleWritable (value.age));
}
}

public static class IntSumReducer extends MapReduceBase implements
Reducer <Text,DoubleWritable,TeacherRecord,NullWritable> {

NullWritable n = NullWritable.get ();
@ Override
public void reduce (Text key, Iterator values,
OutputCollector output, Reporter arg3)
throws IOException {
/ / TODO Auto-generated method stub
System.out.println ("enter the reduce function");

double sum = 0;

while (values.hasNext ()) {
sum + = values.next (). get ();
}

output.collect (new TeacherRecord (key.toString (), sum), n);

}
}

static class TeacherRecord implements Writable, DBWritable
{
String name;
double age;

public TeacherRecord (String m_name, double m_age) {
this.name = m_name;
this.age = m_age;
}

@ Override
public void readFields (DataInput in) throws IOException {
this.name = Text.readString (in);
this.age = in.readDouble ();
}

@ Override
public void write (DataOutput out) throws IOException {
Text.writeString (out, this.name);
out.writeDouble (age);
}

@ Override
public void readFields (ResultSet resultSet) throws SQLException {
this.name = resultSet.getString (1);
this.age = resultSet.getDouble (2);
}
@ Override
public void write (PreparedStatement statement) throws SQLException {
statement.setString (1, this.name);
statement.setDouble (2, this.age);
}
}

public static void main (String [] args) throws Exception
{
String driverClassName = DRIVER_CLASS;
String url = DB_URL;
String dbuser = DB_USER;
String dbpw = DB_PASSWD;

/ / map-reduce setting
/ / JobConf job = new JobConf (getConf (), WordCountDB.class); / /???? doubt getConf is doing

JobConf job = new JobConf (WordCountDB.class);
job.setJobName ("Count from DB");

job.setMapperClass (TokenizerMapper.class);
job.setCombinerClass (IntSumReducer.class);
job.setReducerClass (IntSumReducer.class);

job.setInputFormat (DBInputFormat.class);
DBConfiguration.configureDB (job, driverClassName, url, dbuser, dbpw); / / connect to database
DBInputFormat.setInput (job, TeacherRecord.class, "teacher", null, "name", FieldNames);

job.setMapOutputKeyClass (Text.class);
job.setMapOutputValueClass (DoubleWritable.class);

DBOutputFormat.setOutput (job, "teacher", FieldNames);
job.setOutputKeyClass (TeacherRecord.class);
job.setOutputValueClass (NullWritable.class);

try
{
JobClient.runJob (job);
}
catch (Exception e)
{
e.printStackTrace ();
System.out.println ("abnormal operation");
}
}

the following error:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf (ReflectionUtils.java: 93)
at org.apache.hadoop.util.ReflectionUtils.setConf (ReflectionUtils.java: 64)
at org.apache.hadoop.util.ReflectionUtils.newInstance (ReflectionUtils.java: 117)
at org.apache.hadoop.mapred.JobConf.getInputFormat (JobConf.java: 400)
at org.apache.hadoop.mapred.JobClient.writeOldSplits (JobClient.java: 810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal (JobClient.java: 781)
at org.apache.hadoop.mapred.JobClient.submitJob (JobClient.java: 730)
at org.apache.hadoop.mapred.JobClient.runJob (JobClient.java: 1249)
at WordCountDB.main (WordCountDB.java: 205)

tangled for a long time, I hope you can help out, thank you!
------ Solution ---------------------------------------- ----
brothers you play enough profound, Map / Reduce are started up. While also doing cloud computing-related business, but did not come into contact with the actual Map / Reduce programming, can not help you. Friendship top.
------ Solution ---------------------------------------- ----
Database recommend using HBase, Hive
------ For reference only ------------------------- --------------
But, according to business requirements, data that is under Oracle, no way ah. .

I was followed by a senior postdoctoral study only, fourth year, not how will. .
------ For reference only -------------------------------------- -
this problem has been solved, in cygwin and HADOOP_CLASSPATH to add CLASSPATH and environment variables, addresses are HADOOP / LIB, and then all of Oracle's entire library copying HADOOP / LIB can be. But there will be another problem, Error reading task outputhttp :/ / chengzhf: 50060/tasklog? Plaintext = true & taskid = attempt_201108201645_0004_m_000001_0 & filter = stdout

To a long time to complete the job of operations. .

begged the god to appear. . . .
------ For reference only -------------------------------------- -
same neighborhoods! ! !
------ For reference only -------------------------------------- -
i really want to know how to solve the problem ..
------ For reference only ------------- --------------------------
I Nutch1.6 also reported this error, what is what causes it? Thank you.
hoodp log files
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf (ReflectionUtils.java: 93)
at org.apache.hadoop.util.ReflectionUtils.setConf (ReflectionUtils.java: 64)
at org.apache.hadoop.util.ReflectionUtils.newInstance (ReflectionUtils.java: 117)
at org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java: 432)
at org.apache.hadoop.mapred.MapTask.run (MapTask.java: 372)
at org.apache.hadoop.mapred.LocalJobRunner $ Job.run (LocalJobRunner.java: 212)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf (ReflectionUtils.java: 88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf (ReflectionUtils.java: 93)
at org.apache.hadoop.util.ReflectionUtils.setConf (ReflectionUtils.java: 64)
at org.apache.hadoop.util.ReflectionUtils.newInstance (ReflectionUtils.java: 117)
at org.apache.hadoop.mapred.MapRunner.configure (MapRunner.java: 34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 25)
at java.lang.reflect.Method.invoke (Method.java: 597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf (ReflectionUtils.java: 88)
... 13 more
Caused by: java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found.
at org.apache.nutch.net.URLNormalizers. (URLNormalizers.java: 123)
at org.apache.nutch.crawl.Injector $ InjectMapper.configure (Injector.java: 74)
... 18 more
console error job failed

2013年8月12日星期一

wordcount remote debugging exception.

hadoop cluster: centos 5.8 jdk1.7 hadoop1.0.1.
master: 192.168.1.101; slave: 192.168.1.102,192.168.1.103,192.168.1.104.
individually configured with a development environment:
OS: win2003, eclipse-jee-helios-SR2-win32.zip, jdk1.7.
configured development environment, debugging wordcount program, such a mistake.
12/04/24 15:32:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin- java classes where applicable
12/04/24 15:32:44 ERROR security.UserGroupInformation: PriviledgedActionException as: Administrator cause: java.io.IOException: Failed to set permissions of path: \ tmp \ hadoop-Administrator \ mapred \ staging \ Administrator-519341271 \. staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \ tmp \ hadoop-Administrator \ mapred \ staging \ Administrator -519341271 \. staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue (FileUtil.java: 682)
at org.apache.hadoop.fs.FileUtil.setPermission (FileUtil.java: 655)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission (RawLocalFileSystem.java: 509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs (RawLocalFileSystem.java: 344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs (FilterFileSystem.java: 189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir (JobSubmissionFiles.java: 116)
at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java: 856)
at org.apache.hadoop.mapred.JobClient $ 2.run (JobClient.java: 850)
at java.security.AccessController.doPrivileged (Native Method)
at javax.security.auth.Subject.doAs (Subject.java: 396)
at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java: 1093)
at org.apache.hadoop.mapred.JobClient.submitJobInternal (JobClient.java: 850)
at org.apache.hadoop.mapreduce.Job.submit (Job.java: 500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion (Job.java: 530)
at com.hadoop.learn.test.WordCountTest.main (WordCountTest.java: 85)

online tips are installed under Windows file permissions issue, saying that under linux does not have this problem.
So I decided to re-build a separate centOS development environment, the problem still exists.
in your code conf.set ("mapred.job.tracker", "192.168.1.155:9001"); sentence after the problem is resolved, but
program runs very slow, but you can get the right result. Tracking tasks in http://192.168.1.101:50030 found: slave1 and slave2 have this error: state: failed.Error: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out..
wordcount run directly on the master does not have this error.
or in windows development environment, change / hadoop-1.0.2/src/core/org/apache/hadoop/fs/FileUtil.java inside checkReturnValue:
private static void checkReturnValue (boolean rv, File p,
FsPermission permission
) throws IOException {
/ **
if (! rv) {
throw new IOException ("Failed to set permissions of path:" + p +
"to" +
String.format ("% 04o", permission.toShort ()));
}
** /
}
commented after recompilation hadoop-core-1.0.1.jar, and no Shuffle Error. Puzzled by this problem a long time. Experts expect to solve it.

------ Solution ------------------------------------ --------
looks like hadoop default start many threads, more than it will error, modify the configuration file.
------ For reference only -------------------------------------- -
above error is completed in the map, reduce phase appears. reduce 16% when after

Hadoop -based open source network disk

Want to study hadoop -based network drive , will now there are no more mature open source network disk .
------ Solution ---------------------------------------- ----
I also re- study the same question

------ For reference only ---------------------------------- -----

The reply deleted by an administrator at 2013-04-07 14:51:59

Stand-alone version of HBase ye shell below create 'test' 'cf'ERROR

ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
------ Solution ------ --------------------------------------
good , ggiqggfn@163.com < br> ------ Solution ----------------------------------------- ---
If your system is ubuntu , it may cause : in / etc / hosts file to write this is so
127.0.0.1 localhost
127.0.1.1 ubuntu.ubuntu-domain ubuntu
be changed to :
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
should be on it
official online face this to say:
http://hbase.apache.org/book/quickstart.html
------ Solution ------------------ --------------------------
Oh , all right . My pseudo-distributed Hadoop is built , and it is there that problem, but changed after the next restart hbase on it.
------ Solution ---------------------------------------- ----
I was in redhat environment, also have this problem, but the :: 1 to 127.0.0.1 restart hbase still does not solve the
------ Solution ------ --------------------------------------
Master is initializing
indicates that the host is initialized , HMASTER not start successfully . Wait for the auto will be fine ,
------ Solution ---------------------------------- ----------

opinionated .
------ For reference only ------------------- --------------------

? ? ?
------ For reference only -------------------------------------- -

see the official website said , thank you oh ~ ~

winxp Bottoms hadoop, formatting can not find the JDK

My file path
C :/ cygwin
c :/ cygwin/home/Administrator/hadoop-1.0.4
d :/ Java/jdk1.6.0_03

hodoop-env.sh add
export

export JAVA_HOME = / cygdrive/d/Java/jdk1.6.0_03
export HADOOP_COMMON_HOME = / home/Administrator/hadoop-1.0.4

$ bin / hadoop NameNode-format
Back
java.lang.NoClassDefFounError: NameNode
Exception in thread "main"

How can we solve , my java program is running normally resident agencies that hadoop can not be formatted
------ Solution ------------------ --------------------------
bin / hadoop namenode-format
------ For reference only - --------------------------------------
Thank javalzbin, watching the "real Hadoop "the book, did not pay attention to documentation is lowercase .

2013年8月11日星期日

hadoop run the example WordCount error : Unknown program WordCountchosen

------ Solution ------- -------------------------------------
is lowercase "wordcount"
- ----- For reference only ---------------------------------------

Packaged in eclipse procedures for the jar file , put linux runningunsuccessfully

Develop a good project in eclipse , the program has two job, so write a scheduling class JobSchedule to control the order of execution of these two job , as well as dealing with some inputpath and stuff .
The whole process successful implementation in eclipse , no problem , but when the program packaged into jar, get linux, bin / hadoop below to perform when the log output on and executed in eclipse is not the same In addition to the program execution process is not moving. . Not sure how else
execution in linux command bin / hadoop hadoop jar MyJob.jar
because the input is the output path of the directory structure in the program designated , so there is no path with the input and output parameters.

there who encounter such a situation, or know the reason, brother Kouxie a ~ ~
------ Solution ----------------- ---------------------------
only own analysis of the error log of
------ Solution --- -----------------------------------------
not met , no too well
packaged runnable jar file when the election try it .
------ For reference only -------------------------------------- -
people do ~

------ For reference only ---------------------------------------
ah . Problem solved ~ ~ NND, is the need to put all the machine's IP address and host name configuration to / etc / hosts file to go, another master machine's / etc / hosts file first line 127.0.0.1 .. .. this should be commented out ~ ~
Thank you good people for help or friends ~ ~

Run hive, hadoop error Call to localhost/127.0.0.1: 9000 failed onconnection

I've namenode format. Build the two nodes. Or reported this error:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer =
In order to limit the maximum number of reducers:
set hive.exec.reducers.max =
In order to set a constant number of reducers:
set mapred.reduce.tasks =
java.net.ConnectException: Call to localhost/127.0.0.1: 9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException (Client.java: 767)
at org.apache.hadoop.ipc.Client.call (Client.java: 743)
at org.apache.hadoop.ipc.RPC $ Invoker.invoke (RPC.java: 220)
at $ Proxy4.getProtocolVersion (Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy (RPC.java: 359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode (DFSClient.java: 106)
at org.apache.hadoop.hdfs.DFSClient. (DFSClient.java: 207)
at org.apache.hadoop.hdfs.DFSClient. (DFSClient.java: 170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize (DistributedFileSystem.java: 82)
at org.apache.hadoop.fs.FileSystem.createFileSystem (FileSystem.java: 1378)
at org.apache.hadoop.fs.FileSystem.access $ 200 (FileSystem.java: 66)
at org.apache.hadoop.fs.FileSystem $ Cache.get (FileSystem.java: 1390)
at org.apache.hadoop.fs.FileSystem.get (FileSystem.java: 196)
at org.apache.hadoop.fs.Path.getFileSystem (Path.java: 175)
at org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath (Utilities.java: 1475)
at org.apache.hadoop.hive.ql.exec.ExecDriver.addInputPaths (ExecDriver.java: 1253)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute (ExecDriver.java: 632)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute (MapRedTask.java: 123)
at org.apache.hadoop.hive.ql.exec.Task.executeTask (Task.java: 130)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential (TaskRunner.java: 57)
at org.apache.hadoop.hive.ql.Driver.launchTask (Driver.java: 1063)
at org.apache.hadoop.hive.ql.Driver.execute (Driver.java: 900)
at org.apache.hadoop.hive.ql.Driver.run (Driver.java: 748)
at org.apache.hadoop.hive.cli.CliDriver.processCmd (CliDriver.java: 164)
at org.apache.hadoop.hive.cli.CliDriver.processLine (CliDriver.java: 241)
at org.apache.hadoop.hive.cli.CliDriver.main (CliDriver.java: 456)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java: 57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java: 43)
at java.lang.reflect.Method.invoke (Method.java: 616)
at org.apache.hadoop.util.RunJar.main (RunJar.java: 156)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect (Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect (SocketChannelImpl.java: 592)
at org.apache.hadoop.net.SocketIOWithTimeout.connect (SocketIOWithTimeout.java: 206)
at org.apache.hadoop.net.NetUtils.connect (NetUtils.java: 404)
at org.apache.hadoop.ipc.Client $ Connection.setupIOstreams (Client.java: 304)
at org.apache.hadoop.ipc.Client $ Connection.access $ 1700 (Client.java: 176)
at org.apache.hadoop.ipc.Client.getConnection (Client.java: 860)
at org.apache.hadoop.ipc.Client.call (Client.java: 720)
... 29 more
Job Submission failed with exception 'java.net.ConnectException (Call to localhost/127.0.0.1: 9000 failed on connection exception: java.net.ConnectException: ; Connection refused) '
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

------ Solution ------------------------------------ --------
This should be your hive and the number of input files, since the map reduce the number of tasks, the more easily blocked.
------ For reference only -------------------------------------- -
I also encountered the same problem, how to solve it? I have the same problem in standalone mode
------ For reference only ------------------------------- --------
solution to this problem is what? Same encounter
------ For reference only ----------------------------------- ----
implementation of what hadoop namenode-format
then restart hadoop enough

no datanode to stop

I am now on the master after jps results :
19802 NameNode
20075 JobTracker
19987 SecondaryNameNode
20186 jps
111587 TaskTracker
Why Master there will be TaskTracker? ? Is not it is no

And I Datanode on very strange results after jps

18144 JPS
7137 TaskTracker
18077 TaskTracker
Why are there two TaskTracker ah ? ?

Then I run stop-all.sh Master
192.168.66.128: no datanode to stop

I am now a Master, a Datanode such

And there is another problem that Datanode on the dfs tmp folder under the data folder is empty , there is no current file, I can not find the Datanode VERSION file, so can not change on Datanode the namespaceID

Will we have not encountered such a situation , seeking help, thank you very much
------ Solution -------------------- ------------------------
your datanode gone up Oh ~ pro ~
------ For reference only - --------------------------------------

I ask how do I do ? In the Master to do start-all.sh operation is not should be able to put Datanode up , Datanode not on what operating ? ?
------ For reference only -------------------------------------- -

The reply deleted by an administrator at 2013-03-15 21:30:18

------ For reference only ---------------------------------- -----
right, if the eye may be configuration problems , you can look at the log

About datanode direct access problems ?

hdfs file is the mechanism to take the client through the namenode get metadata information , and then extract the data directly connected datanode ;
It is not to be directly connected datanode datanode data on the whole do down ? This is no problem with it ?
initial contact hadoop hope prawn cordial guidance
------ Solution ---------------------------- ----------------
hdfs the file is divided into one or more blocks to store , these blocks are not necessarily the same data are stored in the node , namenode recorded each block in each file where the data node information , so you do not go through namenode datanode taken down even from the data , do not know how to reconstruct the data into files.
------ Solution ---------------------------------------- ----
DN is RPC interface , you can visit !
------ For reference only -------------------------------------- -
Thank you for reminding , oh pro
------ For reference only ---------------------------- -----------

positive solution

map-reduce demand hbase table maximum and minimum issue

Tested wordcount program , now do a hbase statistical functions , namely the use mapreduce hbase table in a column calculating the maximum, minimum , average , and sort ( sort field in the actual production are some of the figures ) .
------ Solution ---------------------------------------- ----
hbase have existing interfaces can give you use ! ! !
see more hbase MR, online there !
------ For reference only -------------------------------------- -
the biggest problem is that I do not know how to get the records hbase table form input to the map . Is it not to log into a file and then input it sir ! New to hadoop, but also expert guidance ! ! ! Guixie ! ! !
------ For reference only -------------------------------------- -
fourth year . Quite troublesome ! ! ! !

good . Thanks !

订阅：博文 (Atom)