sábado, 16 de setembro de 2017

Design: MapReduce Programming Model

With points of contact with MPI (Message Passing Interface) and for processing and generating big data sets with a parallel, distributed algorithm on [low-cost] clusters:

https://en.m.wikipedia.org/wiki/MapReduce

Used (and patented) by Google at some point in time to revolutionize the indexing of the WWW.

Quoting:
"A MapReduce program is composed of a Map()procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance."
(...)
"DeWitt and Stonebraker have subsequently published a detailed benchmark study in 2009 comparing performance of Hadoop's MapReduce and RDBMS approaches on several specific problems. They concluded that relational databases offer real advantages for many kinds of data use, especially on complex processing or where the data is used across an enterprise, but that MapReduce may be easier for users to adopt for simple or one-time processing tasks."