Use Hadoop to find popular words of open source Java codes

Project Details

Project Lead
Lin Liu 
Project Manager
Lin Liu 
university of nebraska-lincoln, Computer science and engineering  
Computer Science (401) 


Open source projects contribute so many free codes. Some functional-similar codes might share common “codes” or “words”, such as class name, variable name, and method name. Based on those common features, we might help programmer to find out other’s most-related codes for reference. Find similar codes might be difficult; find common words is easier. This project is to find out the popular words used in open source Java projects as a first step for above object. We use Hadoop to do the word counting job since it is convenient and could easily scale to bigger data set.

Intellectual Merit

Until now we have no Intellectual Merit, since this is a course project.

Broader Impacts

This project might help programmer to find out functional related open source Java codes, which will convenient their coding and debugging.

Scale of Use

I need apply for around 20 VMS for this purpose.