Skip to content
Tom Maiaroto edited this page Aug 12, 2014 · 4 revisions

Overview

Social Harvest is a social media analytics platform designed to make access to social metrics more readily available to users. It monitors subject matter, conversations, and sharing across the internet (and actually does not limit itself to just social networks). As it pertains to social sharing, it harvests two main types of metrics:

  1. Content Metrics
  2. Growth Metrics

Content Metrics

Given a set of criteria such as; keywords, URLs, Facebook pages, and social media accounts, a territory is defined. Social Harvest crawls the internet to discover social information that belongs in this territory such as; mentions, comments, shared web pages, shared media, keywords, hashtags, and more. Essentially, it looks at how internet users are engaging. This data is processed and refined by Social Harvest to enrich it with additional details such as sentiment analysis, gender, semantic tagging, and so on.

A lot can be understood about a given subject matter across the social web with these content metrics. It can help marketers execute elaborate marketing plans. It can assist with content marketing in a very big way. It can help researchers from an educational or social perspective. The uses cases and users are many.

Growth Metrics

Growth metrics indicate influence and virality across the internet for contributors (and content). Social Harvest monitors changes in metrics such as followers, likes, and so on. This can tell a lot about a particular user's audience and their reach. A user in this case could be an individual or an organization or company.

Both of these complex set of metrics can be combined in some very interesting and useful ways.

How it Harvests

The harvester is written in Google's language Go and is designed for speed, efficiency, and flexibility. It performs harvesting, logging, and storage concurrently and is designed to make use of multithreading and parallel processing. Social Harvest is not limited by normal scaling concerns, but rather by how much data can be extracted from 3rd party APIs due to their rate limits.

While the application itself can store data into MySQL, PostgreSQL, and MongoDB, additional storage options are easily available to a user. Since all data is stored in a simplistic fashion, nearly any database can be used to hold the data. This extreme flexibility is made possible by log file output. The, optional, log files written by the harvester can be processed using tools like Fluentd or Logstash which have plugins to send the data to various data stores or applications for additional processing.

Bottom line: Social Harvest is designed to fit into nearly any workflow without getting in the way.

Harvesting is defined and performed on a configured schedule and there are few ways one could use the harvester:

Stand Alone Harvester

One could simply define a configuration file to harvest data out to log files only and then have something like Fluentd tail those files and send the data to a data store of their choosing. This would be, perhaps, the most minimal use of the harvester. Data would be flowing in one direction.

Social Harvest processes data so it is not a replica of what was discovered, but additional processing is quite possible when sending the data through another workflow using something like Fluentd. This would be one reason for using Social Harvest in this capacity.

Harvest and Store

One could also take it a step farther and configure a SQL or MongoDB database connection to have the data stored. This would still mean that data flows in one direction, but in this instance Social Harvest also takes care of the storage as well. This database (or databases) could be on the same server or hosted elsewhere.

At this point, one would need to retrieve the data to make use of it of course. While Social Harvest has a dashboard tool, one may wish to create their own custom tool for retrieving and analyzing harvested data. For example, one may wish to bring this data into an existing application. This would be one reason for using Social Harvest in this capacity.

Harvest, Store, and Visualize

Taking it a step farther, one could make use of Social Harvest's API in order to retrieve the data it harvested. This would be the full service use case where a user could load a web page to visualize the harvested data to make conclusions about it. Social Harvest's dashboard (a separate repository found here on GitHub) is simply a front-end that looks for specific messages from a server to draw graphs and generate reports. It is designed for use with the harvester's API, but could be used with another API provided it's compatible (or given modification to the dashboard application).

This use case would be the close to the experience of a SaaS. The only difference is the user must host the application on their own server. This is desirable though because costs are as low as possible (one could even run it on an old computer at the office and leave it turned on 24/7 for the cost of internet access and electricity).

The other benefit here is that users own the data harvested. Most social media analytic services will not only keep the harvested data from being downloaded in its raw form, but also limit the amount of data harvested. Again, Social Harvest is only limited by the rate limits imposed by 3rd party APIs such as Facebook's and Twitter's API. This often results in a volume higher than what most services provide (without shelling out for a more expensive plan).

You can learn more about why Social Harvest was (re)created and why it is open-source software at SocialHarvest.io