SIRecommender Project

Abstract

This project was a continuation of efforts conducted by the CS4624 team of Spring 2016. The previous team built a procedure to find the top k friends based on information submitted via surveys. This year’s project dealt with analyzing data received through these surveys in order to draw conclusions about different characteristics and demographics that are prominent between recovery buddies on a live social network, Friendica. This year’s team of students worked with data that had been collected from the social network, which contain information about articles read, modules liked, and other information that may be used to find commonalities and form a friendship.

After parsing the information, we evaluated friendships and the homophily-based measures that the two people have in common. We analyzed them to find trends through the visualization of data (histograms) and a top-down approach. Our main focus was our top-down approach, in which we determined the similarity scores of two recovery buddies given their similarities in demographics. When we identified pertinent demographics, we calculated the probabilities of similarities so that we can statistically describe how friendships are driven by similarities on Friendica. This was part of our final deliverable. We also focused on diffusion. We analyzed the tendency of a user to attend a meeting, watch a video module, or complete other tasks because another recovery buddy did so. This helped us identify how the network experiences diffusion. We used diffusion to identify users that experience high or low amounts of interaction with other users and can identify their similarities through homophily-based measures.

One of this team’s focuses included different aspects of weighting different feature types. This mainly meant tuning parameters and observing the changes that those parameters produced. We needed to understand how to tune these parameters and how to improve the outcome. Another focus this semester was making predictions on friendships based on the answers submitted by participants through the surveys given. These findings gave the team insight to distinguish contact from homophily. The team gained a visual understanding of this information through histograms. Socio-economic status, gender, and number of addictive substances are key parts of homophily in this project that were visually observed.

Our results were trends in our top-down and diffusion approaches. Top-down resulted in 55 close relationships with many of them being of the same gender and income level. Our diffusion results gave us the level of influence particular users had on each other. Our final deliverables consisted of documentation of these results and the code that went into finding them. Our top-down and diffusion results, along with our analysis on homophily are our main deliverables.

Description
Keywords
Social Interactome, SIRecommender, Homophily, Friendica, addiction recovery, clinical trial, lattice network, small-world network, diffusion
Citation