Thesis topic "Topic analysis for conferences"

(C) 2016, Software Languages Team, University of Koblenz, Ralf Lämmel

Objective

Given different conferences that overlap with some predefined area, we would like to comprehend the commonalities and differences for these conferences. To this end, titles (and possibly abstracts) for papers in the different conferences are to be processed by a text analysis, a tagcloud-based visualization is to be derived, and some method or IR/machine learning should be applied so that conferences can be quantitatively compared.

Deliverables

  • The actual software for highly automated computation of tagclouds for a given list of conferences.
  • The actual software for comparison, e.g., based on topic models.
  • A report that describes the approach and lists the results and discusses them.
  • The report also needs to discuss briefly some related work and underlying foundations.

Details

  • See the section "Software languages in academia" in the book chapter on software languages for a list of conferences concerned more or less with software languages.
  • For the actual analysis of topics, some sort of text analysis, IR method has to be picked. This will require some significant study of available methods. For instance, topic models may be used.
  • Some literature study seems to be needed to figure out whether there are existing papers or tools that do more or less what is wanted here.
  • DBLP can be accessed in different ways. There is presumably some XML or JSON dump and DBLP may be also accessible in different ways as a Linked Data resource. It needs to be decided what approach is suitable. The information must not be manually extracted. The information should not be extracted from the actual web view.
  • As common in the case of any sort of text analysis, common English needs to be removed, stemming needs to be applied. NLTK and WordNet could be applied for this purpose.
  • There are many technologies out there to derive tagcloud visualizations from appropriate inputs; see, for example, the technologies used for the tagclouds of 101, i.e., 101worker modules with "tagcloud" in their name.