N-grams

In the few spare minutes I’ve had this week, I’ve been trying out n-grams as a comparable with other text-mining processes. This still fits squarely in the category of “a lot to learn,” but I’m happy to be running Perl and various Lingua::EN modules on this (nothing super-complicated in the switch, but all of my previous Perl tinkering was on a PC). Today’s corpus was perhaps too small to yield many insights: just 2300 words across 10-12 email messages sent via WPA-L on Tuesday, August 24. All the same, insights or no, here the top five bigrams, with T-Units enumerated to the fourteenth decimal place, i.e., something like hundred-trillionth or quadrillionth position. Such precision is useful, I suppose, for avoiding ties.

Bi-grams (T-Score, count, bigram)
2.22372310460229	5	audience awareness	
1.72376348313075	3	writing centers	
1.72376348313075	3	writing spaces	
1.41265204574184	2	develop new	
1.41109052911059	2	new ways	

It looks like “develop new ways” is part of a trigram that shows up twice in the corpus. This script–a fine one, by the way–renders those three words into a 2×2 bigram. But that’s exactly what it was assigned to do.

Address Keywords

How best to arrive at keywords (before they are tags)? One humorless punchline is that I will not soon have a degree in computational linguistics. I have dealt
superficially with the question this week, first by thinking about the relationship
of the terms assigned by various methods–where we have keywords at all, that
is. The most prominent journals in composition studies do very little with
keywords, much less with tags (here I am thinking of tags as the digital
iteration of keywords that includes latent, descriptive, and procedural
labeling). Why is that?

Continue reading →

Trouble Shot

Even if the following fixes are only useful to one or two people, posting
them to the blog makes them differently available for searching and bookmarking.
Since I installed MT3.34, I ran across a couple of small snags. Nothing
too off-putting, really. Just bumps along the up-gradual way.

First, the new tagging features in MT3.3+ are, as I’ve said before, really
slick. But I was having trouble with the interface that allows me to merge
tags. Say I have two tags I want to merge, like "method" and "methods."
Okay? I click on one or the other and I the tag becomes editable. After I
apply changes, I can select "Rename," in which case it will summon the database
to see if the new tag already exists. If it does exist, a java popup asks
whether I want to proceed with the merge. If the revised tag doesn’t
exist, it goes ahead and applies the change. The other option, "cancel,"
does just that. Simple, eh?

Continue reading →

Perl

For the last twenty-four hours, I’ve been obsessing and scrambling to figure out how to modify a perl script for a project I’m working on. And just when I was beginning to feel like things were near collapse and failure was imminent, along came an email reminding me that I’ve already got a solid method and usable data without perl. At the very same moment an un-named family member said, “You’re worked up over Pearl…from Hee-Haw?”  (imagine side-splitting laughter, all at my expense). But everything’s back to fine and manageable. Just. Like. That.