2013年8月17日星期六

Can Hadoop as a service for real-time processing

 This post last edited by the chaijunkun on 2012-03-16 11:24:18
I might have more problems out. I can say right hadoop also ill. Only know hadoop for distributed processing of large data frame.
But now there is a question:

hadoop many of which are used in log processing. But the logs are kept dynamically generated. As map-reduce application, hadoop can achieve real-time log analysis and processing? Or that can handle the content will not change the log (for example, yesterday, last month, etc.)? I hope Daniel have to give answers. If my description is not very clear, I hope large cattle raised, I would like to add.
------ Solution ---------------------------------------- ----
I feel can not, from the outset not to emphasize Hadoop reaction time, he stressed that the high throughput.
------ Solution ---------------------------------------- ----
each start hadoop, hadoop log files will be combined with the data to be updated, you may also combine secondarynamenode. And each of the operation is not immediately update the data. Seems to be the case, I saw in the book, concrete do not remember clearly. Not in my book, sorry. . . This answer is for reference only
------ Solution ----------------------------------- ---------
landlord good hit hadoop Hadoop is for the key batch job characteristics data throughput but once a job can no longer be changed sometimes start a job may run for several hours.
You say the problem is actually real MapReduce problem, the focus of the study is one of hadoop landlord can refer to the twitter Storm framework specifically for large real-time data streams
writing is not easy to find a sub ~
------ Solution --------------------------- -----------------
Hadoop can not do real-time processing, the reaction time is too slow, and if the application system used to do a dead man.
------ For reference only -------------------------------------- -
do not know. passing. learning.
------ For reference only --------------------------- ------------
thanks "maxim_sin" answer. Indeed as high throughput computing framework, if the request is still very difficult real-time. You say I am inclined to agree with this point "Sometimes a job may run for a few hours." After starting the task, their work may not finish it, how to deal with real-time data obtained? Some people might say: "You can start the project when it introduced a Hadoop ah, so that you can live up." But do not forget, there are limits on the amount of data it. If your Hadoop cluster only a few machines, but the data is too much, resulting in not real-time processing is completed, the data will backlog piled up, and ultimately never piling up. To avoid this problem, simply let it handle a limited amount of data. Other users depending on the case also gave answer scores knot stickers.

没有评论:

发表评论