Weeks 7 and 8 of our course are taught by Dr. Carolyn Penstein Rosé who is on the computer and languages faculties at Carnegie Mellon University. She’s particularly interested in collaborative learning.

The unit is on “text mining.” Text mining is a subset of “data mining.” As far as I can tell (relying heavily on Wikipedia) the goal of data mining is to simplify and visualize the data in big datasets. In essence, I think that “data mining” is really “predictive analysis.”

Text mining is used in a number of fields including classifying movie reviews, understanding medical records, and tracking consumer purchasing.

A potentially very interesting application is automated essay grading (AES) or improvement. Lightside Labs has software that does this in the K-12 realm.

Our assignment is to learn to use lightSIDE, which is an open source machine learning program (no mention of the “data mining” buzzword in the user manual). LightSIDE is built on WEKA but I’m not yet sure what its advantages are over WEKA.

LightSIDE seems to do the basic linear and binomial regressions, tree models, etc. It also has a lot of text processing capabilities. I haven’t had time to read the entire user manual yet, but doing so would be like taking an entire course in text analysis – it seems very good.

My next step is to choose some kind of analysis to do. I have a bunch of scored essays – I might start with that. Back soon.









Text Mining