IF Comp is profiling voters

Four days ago, the IF Comp software was updated to monitor voting behaviour. Voters who make too many “extreme” votes are put on a list of shills.

[code]+# A shill is a voter who casts many extreme votes (10s or 1s) for the
+# current comp
+sub get_possible_shills {

  • my ($self) = @_;
  • my $db = $self->result_source->schema;
  • $db->storage->debug(0);
  • my $votes = $db->resultset(“User”)->search({
  •                                            'entry.comp' => $self->id,
    
  •                                           },
    
  •                                           {
    
  •                                            join => { 'votes' => 'entry' },
    
  •                                            'select' => [
    
  •                                                         'me.id',
    
  •                                                         { count => 'votes.id', '-as' => 'total_vote_count' },
    
  •                                                         { sum => 'CASE WHEN votes.score = 10 OR votes.score = 1 THEN 1 ELSE 0 END', '-as' => 'extreme_score_count'}
    
  •                                                        ],
    
  •                                            as => [ 'me.id', 'total_vote_count', 'extreme_score_count'],
    
  •                                            group_by => 'me.id',
    
  •                                           }
    
  •                                          );
    
  • Shills have voted a lot (> 5 times) and they have extreme votes

  • my $data = [
  •            map {
    
  •                  {
    
  •                      user => $db->resultset("User")->find($_->get_column("id")),
    
  •                      total_vote_count => $_->get_column("total_vote_count"),
    
  •                      extreme_score_count => $_->get_column("extreme_score_count")
    
  •                  };
    
  •            }
    
  •    grep {
    
  •           ($_->get_column("extreme_score_count") / $_->get_column("total_vote_count")) >= 0.5
    
  •            && $_->get_column("total_vote_count") > 5
    
  •         } $votes->all ];
    
  • $db->storage->debug(0);
  • return $data;
    +}
    [/code]

At any time the IF Comp admin can view the list of “extreme” voters.

+               [% IF shills.size %]
+               <h4>Possible Shills</h4>
+               
+               <table class="table table-condensed">
+                 <thead>
+                   <tr>
+                     <th>Email</th>
+                     <th>Extreme Score Ratio</th>
+                   </tr>
+                 </thead>
+                 <tbody>
+                   [% FOREACH shill = shills %]
+                   <tr>
+                     <td>[% shill.user.email %]</td>
+                     <td>[% shill.extreme_score_count %]/[% shill.total_vote_count %]</td>
+                   </tr>               
+                   [% END %]
+                 </tbody>
+               </table>
+               [% END %]

Voters who just vote for 5 games are exempt. A vote of 10, 10, 10, 10, 10 (5 tens) is OK. A vote of 10, 10, 10, 10, 10, 10 (6 tens) is extreme.

Reviewing my votes, if the random shuffle I received from the server had been in a different order, I would have been labelled a shill.

Note the word “possible”. The software does not make decisions, it reports situations which the administrators may wish to look at.

It has always been the case that the administrators monitor the votes for possible manipulation. That’s necessary in any Internet-based voting system.

What’s the idea here? I would have thought a disingenuous voter was more likely to make only the bare minimum number of votes.

It seems imprudent to make fraud detection code visible to the public, though I admit that it’s fun in this case to see how the sausage is being made.

Thank you for the code review!

It is as Zarf said; this is a tool to point out patterns on ballots that might deserve further human attention, and certainly does not imply any wrongdoing of those who cast said ballots.

This feature was added as part of new features to support the efforts of a “vote counter”, one of the new volunteer roles I added this year to take some of the iFComp cognitive load off of just-me-alone. We’re still experimenting with it, and I appreciate the interest from the community. (Frankly I am not sure that the web interface is the best way to do this particular bit of business; it might be nicer to just have a nice CSV export function that allows the volunteer to play with the data however they wish on a spreadsheet. But in any case it’s better than the mile-long raw SQL queries I’ve used in years past!)

I do apologize for some of the language, which I can see now is more provocative than intended. We’ll clean it up in a future patch.

  1. Is “shill voter” the official term used for these individuals?

  2. Once this software pings you, and the comp officiating committee decides you to be a potential shill, does the accused shill then get the opportunity to plead their case before the officiating committee before a final ruling is made?

  3. Are judges allowed the full length of the judging period before the committee renders their judgement on the shills? What may look like shill voting at first could just be a preliminary assessment of the competition before fine tuning the final scores.

Thank you.

on the off-chance jason misses this, i haven’t ever seen a comp runner or community figurehead decide that someone’s been acting in bad faith without reaching out to communicate with that person or confirm their suspicions. even in cases of suspected cheating, there’s still been a significant amount of conversation and back-and-forth involved. i wouldn’t be worried about jmac or anyone else abruptly deciding that people’s votes won’t count or that they’ll be banned from voting in the comp.

There’s no official term whatsoever for this situation, except for “voting in bad faith”.

Regardless of whether the officiating body is calling it being a “shill voter” or “voting in bad faith”, is there an appeals process you can go through once you are placed on this list? Also, is this database of names shared collectively across all interactive fiction game comps, or just the ifcomp?

Essentially they created a routine to automate the process of identifying anomalous voting behavior. That would seem to be the only change. This has probably been done manually in the past. It is stated there is always a human in the process that you can talk to, so an formal “appeals process” would seem to be overkill.

In my four years of participation there has never been any public call out in the vein of “these five people voted anomalously.”

I can speak only as the IFComp organizer. As such, I do everything I reasonably can to make sure that everyone who casts a ballot does so in good faith, and that does include communicating with judges directly should things look a bit off for whatever reason. I feel it would be rather counterproductive of me to capriciously reject ballots based on nothing but data, especially at the relatively low levels of activity we deal with (with a final ballot-count in the low hundreds).

There are no rules for judges to meet other than those listed on the website’s rules page. The frustrated confusion I had observed regarding the authors’ no-communication rule helped encourage me to try relaxing that rule this year, and I certainly have no intention of fostering anything like this among the judge rules. I really do want as many people to participate in IFComp as possible!

I regret the error on my part of approving the code in question into the main IFComp codebase, as I can see clearly now how it misrepresents certain IFComp organizational practices. I have since removed it, with apologies.

As a judge, I find the program to be a very practical tool for assisting the officiating body in their many responsibilities, one of many I am sure. I was also not concerned over the word “shill” in the code as that is a proper English word to describe who the program is attempting to target. My only concerns were that us judges are given ample opportunity to defend our scores if ever they are brought into question, as well as how long do we have to start finalizing our scores (before the heat is put on us over potential shilling) before the judging deadline ends as managing (in some circumstances) 58 (?) games is just as much of an enjoyable and satisfying experience as it is a little daunting in trying to accurately score each game so that it is fair to the judge, the author, and the community as a whole.

Based on what I have read so far, I get the impression that the officiating body is more than reasonable in giving judges the benefit of the doubt, and giving them more than enough opportunity to present their case if ever one arises.

Thank you.

The only possible answer is that you should vote by the voting deadline. It’s the responsibility of the vote-counters to fairly handle all the votes that come in on time.

As someone who has seen a decent amount of commented code, I assure you this is polite compared to many applications you may use regularly.

I just got done playing, scoring, and reviewing the last of the 56 games I am allowed to judge. I will have no problem bringing up many of the lower scores to a more reasonable one in the next few days, it is actually the next thing on my priorities list. The problem is bringing some of the higher scores down. I currently have thirteen games scored at a 10, I may be able to pull a few of them down to a 9, but even then it appears too disproportionately high to avoid the suspicion(mild curiosity?) of the comp officiating body. Do they give you at least a day to gather and prepare your notes before the questioning begins? Thank you.

I think you might be being a little too paranoid about this? It’s not like they’re going to perform a full-scale investigation just because you gave high scores to a lot of things. As long as you’re not voting in bad faith, I don’t think there’s anything to worry about.

50% Lack of sleep from all of the games.
25% Concern over disproportionate amount of 10s.
20% Attempt at dry humor.
15% Bad at math.

Speaking as one of this year’s IFComp volunteers (aka technically part of the comp officiating body), you’re getting way too concerned about this.

Judges get to determine their own metrics. If the comp doesn’t look like a bell curve to you, don’t warp your results.

Ok, thank you for clearing that up.

To pile things on, the Central Limit Theorem says that, even if your scores aren’t a bell curve, the combination of others’ will probably make for one in the overall score distribution. This isn’t super rigorous, but it should give people a bit more concrete relief that their scoring doesn’t need to fit any sort of scale, be it linear, bell curve, hourglass or even weighted more to 1’s or 10’s.

That said I have seen reviewers give sub-rankings among games with the same score. That lets them see if they maybe want to tweak a couple scores, e.g. if they realize they missed the alternate ending to The Ascot, without adding pressure to actually do so. Or if they really thought Game X was 4 whole points better than Game Y. But it’s hardly necessary.