At this point, I’m sure that everyone has been followed by one of many spam Twitter accounts.
It’s pretty obvious that these are bogus, if not by the lack of actual content, then by the small number of followers compared to the massive number of friends.
So let’s use the Twitter API to write a little script that detects these accounts and automatically blocks them.
The first thing to do is to get the list of followers for your Twitter account. All you have to do is send a GET request to http://twitter.com/statuses/followers.json (change the .json to .xml to get XML back instead). The request should send, via Basic Authentication, your Twitter username and password.
The above gives you a nice list of Twitter users with all sorts of information, including their Twitter user ID, number of followers, and number of friends. From there, it’s simple to loop over the list and check the ratio of followers to friends.
I found
if (Follower / Friends) * 100 < 5
to be a nice threshold.
Now that we’ve identified the bogus accounts, we can use the API call to block them. In this case, we do an HTTP POST to http://twitter.com/blocks/create/id.json, passing the user ID to block as the single POST value.
At the end of this post is a complete Python script that will do all of the above. In the process of writing the script, I got to learn about parsing Python script command-line arguments using optparse.
Using optparse makes handling command-line arguments dead simple.
parser = OptionParser()
parser.add_option("-d", "--dry-run", action="store_true", dest="dryrun", help="Displays accounts that would be blocked.")
(options, args) = parser.parse_args()
if options.dryrun:
print "--dry-run or -d was found on the command-line."
else:
print "No --dry-run or -d found."
The other nice thing is that optparse automatically handles –help (or -h) and prints out a nice help message based on the help text passed to the add_option() method of optparse.
Here’s the complete script which I hope you find useful. I’m releasing it using under the WTFPL license. No warranties, blah blah, not my fault if it breaks you computer, squishes your kitten, etc.
#!/usr/bin/python
import urllib
import urllib2
import base64
import json
import sys
from optparse import OptionParser
#################################################################################
username = ''
password = ''
verbose = False
def twitterRequest(url, username, password, values=None):
b64str = base64.encodestring( '%s:%s' % (username, password))[:-1]
header = {'Authorization': "Basic %s" % b64str}
if not values is None:
values = urllib.urlencode(values)
req = urllib2.Request( url, values, header)
res = urllib2.urlopen(req)
return json.loads(res.read())
def blockExists(username, password):
try:
twitterRequest('http://twitter.com/blocks/exists/%(id)s.json' % {'id': follower['id']}, username, password)
return True
except urllib2.HTTPError, e:
return False
def blockUser(id, username, password):
if not blockExists(username, password):
twitterRequest('http://twitter.com/blocks/create/%(id)s.json' % {'id': follower['id']}, username, password, values={'id': id})
def vMsg(msg):
if verbose:
print msg
#################################################################################
if __name__ == "__main__":
cliError = False
doBlock = False
parser = OptionParser()
parser.add_option("-d", "--dry-run", action="store_true", dest="dryrun", help="Displays accounts that would be blocked.")
parser.add_option("-b", "--block", action="store_true", dest="block", help="Blocks accounts that fall under the specified threshold")
parser.add_option("-v", "--verbose", action="store_true", dest="verbose", help="Print detailed status messages")
parser.add_option("-t", "--threshold", dest="threshold", help="The threshold accounts must fall under before they're blocked (default is 5)")
parser.add_option("-u", "--username", dest="username", help="Twitter username")
parser.add_option("-p", "--password", dest="password", help="Twitter password")
(options, args) = parser.parse_args()
#Handle command-line argument log to make sure this is a valid call
if options.threshold is None:
threshold = 5
else:
threshold = int(options.threshold)
if (options.dryrun is None and options.block is None) or (not options.dryrun is None and not options.block is None):
print "You must select either --dry-run or --block."
cliError = True
if options.username is None:
print "Username required."
cliError = True
if options.password is None:
print "Password required."
cliError = True
if cliError:
print ""
parser.print_help()
sys.exit(1)
username = options.username
password = options.password
verbose = options.verbose
doBlock = options.block
followers = twitterRequest('http://twitter.com/statuses/followers.json', username, password)
spamCount = 0
#All of the command-line stuff was OK so continue with the scanning & blocking
for follower in followers:
followers = float(follower['followers_count'])
friends = float(follower['friends_count'])
ratio = (followers / friends) * 100
if ratio < threshold:
spamCount = spamCount + 1
if doBlock:
prefix = "Blocking Account:\t"
blockUser(follower['id'], username, password)
else:
prefix = "Possible Spam Account:\t"
print prefix + str(follower['id']) + "\t\t" + follower['screen_name'] + "\t\t" + str(followers) + "\t\t" + str(friends) + "\t\t" + str(ratio)
if spamCount <= 0:
vMsg("No followers were flagged as potential spam accounts.")
sys.exit(0)