Automatic Tag Suggestions with JavaScript

Feb 06, 2008 by Hjortur Stefan Olafsson

I must admit that albeit a stern advocate for new approaches to information management, I am (at least on occasion) a bit lost for ‘words’ when it comes to properly ‘tagging’ my stuff. Hence to improve on both this matter and a suitably rainy Sunday (suspiciously frequent in London) I started knocking together a bit of JavaScript that by way of a layman’s version of unsupervised semantic analysis, attempts to suggest relevant subject tags for a given piece of text.

When it comes to a subjective matter like tagging, concept extraction or classification, it goes without saying that a pure technical approach will never provide a perfect solution.

The aim with AutoTags is to algorithmically suggest a relevant set of extracted terms and concepts, hopefully a few more than us mere mortals can conjure up when starved of caffeine or just about to make a dash for the pub. From these suggestions one could pick and choose the better hits and add any others that the former might jog one to think of.

The approach taken here is obviously limited by the fact that in this first iteration doesn’t supplement the text being analysed with a wider corpus. Therefore does that absence of idf (inverse document frequency) statistics mean that I am relying solely on the text in question for accurate tag candidates.

But let me know what you think - you can give it a whirl here and all sources are available on autotags.googlecode.com in case you want to look under the hood.

At this point it’s worth mentioning that AutoTags is English specific, and JavaScript only, but I’m going to try and make this available for other languages and do Java, ActionScript and Python implementation the next time the weather goes awry.