Announcement

Showing posts with label sna. Show all posts
Showing posts with label sna. Show all posts

Wednesday, January 28, 2009

Using Social Network Analysis with Version control data

As I mentioned in the last post, am experimenting about using social network analysis (sna) on verision control data. Now with SVNPlot project, I have a way of converting the Subversion logs into sqlite database. It allows me to query the data in many different ways.

I used the Rietveld repository data and did some premilinary analysis. I am not an expert on SNA but Initial results look very interesting and promising. You can see the results on my website



Update : Oscar Castaneda has added SNA data extraction to SVNPlot as part of GSoC 2010 project. He has used these modifications to analyze Apache repositories and reported his findings in ApacheCon. Check the details at
  1. Life After Google Summer of Code by Oscar Castaneda
  2. Oscar's GSoC 2010 proposal 
  3. Details on how to use his contributions in SVNPlot to extract the data.

Sunday, January 18, 2009

Social Network Analysis and Version Control

Recently I came across the concept of Social Network Analysis.

Given below is small introduction of Social Network Analysis is from Orgnet site
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, web sites, and other information/knowledge processing entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. SNA provides both a visual and a mathematical analysis of human relationships.
The concept is originated in 'social sciences (socialogy, anthropology)' to study the relationships on communities. Today it is being used in fraud ring detection, identifying leaders in organizational network,analyzing the relience of computer networks and various other ways. The various casestudies from Orgnet site can give you good idea about the possibilities.

I started thinking about applying SNA for version control history with files and authors as nodes. There is some research going on in this area in universities. References below have few links. Google search with "data mining version control" will give you additional links

With SVNPlot, now I have a way of converting Subversion logs into an SQLite database. Also Python have some excellent libraries for Network analysis. I am using NetworkX for analysis and Matplotlib for visualization. I think such analysis will be useful in
  1. In indentifying the key developers and their specific areas in the project.
  2. Key files (files which are involved in the code changes more frequently than others)
  3. Identify the clusters of related files (across directories and modules)
I think the results will be useful to software development companies as well especially for getting advance warning for problems and especially big projects in indentifying critical developers, planning the technology transfer during movement from people from one project to another etc. I see many exciting possibilities.

The initial results are interesting. I will put up the charts/analysis etc on my site in a few days time.

References and Interesting Articles/Links
  1. Introduction to Social Network Analysis (from orgnet.com)
  2. Casestudies of Social Network Analysis (from Orgnet.com)
  3. Wikipedia page on Social Networks (Check the history of Social Network Analysis)
  4. Social Life of Routers (Computer networks as social networks)
  5. Finding Go-to People and Subject Matter Experts in Organization
  6. Predicting Defects using Network Analysis on Dependency Graphs – ICSE 2008
  7. Mining Software Archives (a special issue of IEEE magazine)