# MapReduce: Train Random Forest with Python and Hadoop

## Install Hortonworks Sandbox

Hortonworks sandbox provides a nice playground for hadoop beginners to test their big data application.

## Reducer

Code here is a modified version of reducer in this blog

## Test Mapper and Reducer

• Generate a forest_number.txt contains n line, where n is the number of trees you want to generate. Because we generate one tree per line, each mapper loads training data (iris) once, and randomly select feature and records for each tree.
• If you want to generate 5 trees, forest_number.txt contains
• Make mapper and reducer executable

• After this step, generated trees should be stored in /demo/output

• Clean up the output folder after the experiment, this step is important because hadoop will not overwrite existing folder