Implementing an unsupervised technique to classify the GitHub commits by their quality
Project Inspiration - We all have seen projects where we try to classify the quality of projects based on some stats to achieve some objective. This is a similar project but with the intent to build a system that can potentially help the developer to efficiently manage his project by giving high priority to certain commits. Objective - Build a system that can classify the GitHub commits on basis of their quality using an unsupervised method (K-medoids and random forest) Result / Outcome - The algorithm has divided the commits into three categories i.e. Cluster 1, Cluster 2, and Cluster 3. Cluster 1 will represent low-quality commits, cluster 2 will represent mid-quality commits and cluster will represent high-quality commits. his Figure tells us how the algorithm classified more than 300,000 commits. Below are Tables generated to let us know the properties of each cluster. Performance metrics The performance metric table is generated by a random forest algorithm. It tells u...
.png)
.png)
.png)
.png)
.png)
.png)
.png)
Comments
Post a Comment