Data Mining Transcripts

I’m data mining all the transcripts I can get my hands on to assist with updating my extensions relating to improving player experiences. Thought I’d post an interesting early result here: a list of all the command entered into Blue Lacuna at IndieCade 2010, sorted by frequency. The winner is the blank line.

(Coincidentally (or not) version 10 of Smarter Parser now supports mapping blank lines to LOOK or WAIT.)

Here is quick and dirty Python code for anyone who wants to do similar analyses:

commands = {}

for line in open('lacunalogs.txt', 'r'):
    if line[0] == '>':
		thisCommand = line[1:].rstrip('\n\r').upper()
		if thisCommand in commands:
			commands[thisCommand] = commands[thisCommand] + 1
		else:
			commands[thisCommand] = 1
			
for cmd in sorted(commands, key=commands.get, reverse=True):
	print cmd, commands[cmd]

Also, if you have an interesting transcript data set, I’d love to take a look at it.

Similar work here.