Skip to content

Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP

Notifications You must be signed in to change notification settings

dhwajraj/spark-twitter-named-entity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Twitter Named Entity extraction using Spark Streaming

This is a fast named entity extraction module which listens to twitter stream and utilizes Spark Streaming.

For simplicity this project is listening to all user public tweet stream and filtering based only on fortune 500 companies to get tweets related to corporates.

Note: this does not perform recognition of named entities into classes like person, location or organization. Named Entity Extraction is useful for a next layer of class recognition/classification or knowledge base lookups.

Get Running

  • Checkout the project
  • Fill up your twitter keys in TwitterMain.scala
  • Do maven build
  • Download already trained model from here
  • Run TwitterMain.scala or do a spark-submit

spark-streaming-twitter-ner

Sample Output Stream:

(one row corresponds to each tweet)

Set(Microsoft Xbox)
Set(Jacob Johanssen, Facebook)
Set(Spotify, Google, iTunes, Apple Music)
Set(Russell Crowe, United Kingdom)
Set(JS, Facebook, NPM)
Set(National Anthem)
Set(EXO)
Set()
Set(Verizon Wireless Android Smartphone, Samsung N920 Galaxy)
Set(ProChat24 File)
Set(US Supreme Court)
Set(Dusshera)
Set(William Hill, Poker-Stars)
Set(Social Media Surveillance, Instagram Share Data)
Set(Jacob Osei Yeboah)
 .............continuous stream .....

About

Named Entity Extraction on Twitter Stream using Apache Spark Streaming and Stanford CoreNLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages