Your browser is very old. You might enjoy surfing the web more if you used something newer like:

Google Chrome

Even Firefox would be OK.

If you're being forced at gunpoint to use Internet Explorer, you should at least upgrade it. Version 8 is tolerable and 9 will be OK when it comes out.

Posts tagged “googleapi”

Adding/Deleting Events with the Google Calendar API

I recently had a chance to fool around with the Google Calendar API, specifically adding and deleting events.

I’ve talked in the past about using Google Client Logins to access the Google Data API and we’ll use that same ClientLogin code to authenticate our requests to the Calendar API.

Authenticating

There are different ways to authenticate requests to Calendar depending on the type of application you’re writing. If you’re writing a front-end that a user is going to interact with, it’s a good idea to use AuthSub proxy authentication which will redirect the user, let them login with their Google Account, and send them back to your application.

In our case, we’re writing PHP code that will modify the calendar behind the scenes. For that scenario, we’ll use ClientLogin authentication and use the class we wrote last time

In additional to needing a Google Account for authenticating, you’ll also need the Calendar ID if the calendar being used is not the default calendar. The Calendar ID can be found near the bottom of the page in the Settings for the calendar and is just a special, randomly generated email address.

Adding an Event

Adding an event is pretty straightforward. It’s just a matter of sending an authenticated <entry> XML packet via HTTP POST to the calendar URL which is the following:


http://www.google.com/calendar/feeds/userId/private/full

The userId is either your Google Account email address (if using the default calendar) or the Calendar ID mentioned above (if using any other calendar). You also need to make sure the Content-Type of the POST request is application/atom+xml.

The event XML looks like this:

$xml = "
          
          {$params["title"]}
          {$params["content"]}
          
          
          
          
          
          
        ";

A request is authenticated by including the Authorization: GoogleLogin auth=”authToken” header in the request. The authToken value comes back with the initial ClientLogin request. If you’re using my GoogleClientLogin class, the authToken can be retrieved with the getAuth() method.

One thing to watch for is the initial POST request may come back with an HTTP 302 redirect. If it does, the redirect url will contain a gsessionid. In that case, append the gesessionid to the calendar URL and resend the exact same POST request.

If the entry is successfully added, you’ll get back an XML response containing the entry as well as HTTP headers with the ETag (more on this in a minute) and the entry’s edit link (more on that too).

Deleting an Event

Deleting an event is even easier. Just send an HTTP DELETE request to the edit link you got back when you added the event. This can be pulled from either the response XML when you add the event or from the response “Location” header.

If want to make sure you don’t delete an event that’s been modified by someone else, include the If-Match: etag header where etag is the ETag returned when you added the event. If you don’t care, just send If-Match: *

Miscellaneous

I didn’t know how to send an HTTP DELETE with PHP but it turns out to be pretty simple. Just set the curl option CURLOPT_CUSTOMREQUEST to ‘DELETE’ and off you go.

Also, there are currently two versions of the Google Data Protocol. The current is Version 2 and Version 1 is slowly being phased out. All of the above refers to Version 2 and you want to make sure Google Calendar knows that’s the version you’re using. To that end, make sure to include the header GData-Version: 2 with every request.

Code!

And, to illustrate all of the above, here’s a class for encapsulating some of this stuff.

class GoogleCalendar {
    public $data;
    public $xml;

    public function __construct($login=null, $magicCookie="") {
        $this->data = array();

        if(!is_null($login)) {
            $this->login = $login;
        }
        $this->magicCookie = $magicCookie;
    }

    //Return the authorization header used to authenticate all requests after the first one
    protected function getAuthHeader() {
        return 'Authorization:  GoogleLogin auth="' . $this->login->getAuth() . '"';
    }

    //If the calendar we are accessing is the default, the email address is the same as the email used to login.
    //If the calendar is NOT the default, the email address can be found in the Calendar Settings
    //and should be used as the altEmail
    protected function getFeedEmail() {
        return $this->altEmail ? $this->altEmail : $this->login->email;
    }

    //Adding an entry returns the ETag value as part of the HTTP header
    //This function parses the header and attempts to find the ETag and return it
    //$retFields should be the response exploded using \n as the delimeter
    protected function getETagFromHeader($retFields) {
        return $this->getHeaderFromRegex($retFields, "/^ETag:\s*(.*?)$/");
    }

    protected function getEditLinkFromHeader($retFields) {
        return $this->getHeaderFromRegex($retFields, "/^Location:\s*(.*?)$/");
    }

    protected function getHeaderFromRegex($retFields, $regex) {
        if(is_array($retFields)) {
            foreach($retFields as $header) {
                $matches = array();

                if(preg_match("$regex", $header, $matches)) {
                    return $matches[1];
                }
            }
        }else {
            throw new Exception("The header could not be found because the header array was invalid.");
        }

    }

    //Adds an event to the calendar
    public function addEvent($params) {
        $url = "http://www.google.com/calendar/feeds/{$this->getFeedEmail()}/private/full";

        //startTime should be a time() value so we can convert it into the correct format
        $params["startTime"] = date("c", $params["startTime"]);

        //If no end-time is specified, set the end-time to 1 hour after the start-time
        if(!array_key_exists("endTime", $params)) {
            $params["endTime"] = date("c", strtotime($params["startTime"])+60*60*1);
        }

        $xml = "
                  
                  {$params["title"]}
                  {$params["content"]}
                  
                  
                  
                  
                  
                  
                ";

        //Do the initial POST to Google
        $ret = $this->calPostRequest($url, $xml);

        //If Google sends back a gsessionid, we need to make the request again
        $matches = array();
        if(preg_match('/gsessionid=(.*?)\s+/', $ret, $matches)) {
            $url .= "?gsessionid={$matches[1]}";
            $ret = $this->calPostRequest($url, $xml);
        }

        //Parse the XML response (which contains the newly added entry)
        $retFields = explode("\n", $ret);
        //print_r($retFields);
        $entryXML = simplexml_load_string($retFields[count($retFields)-1]);

        //Return an array containing the entry id (url) and the etag
        return array(
                "id"=> (string)$entryXML->id,
                "etag"=> $this->getETagFromHeader($retFields),
                "link"=> $this->getEditLinkFromHeader($retFields)
                );
    }

    public function deleteEvent($url) {
        return $this->calDeleteRequest($url);
    }

    public function calGetRequest($url) {
        $curlOpts = array();
        return $this->calCurlRequest($url, $curlOpts);
    }

    public function calPostRequest($url, $data) {
        $curlOpts = array(
            CURLOPT_POST=> true,
            CURLOPT_POSTFIELDS=> $data,
            CURLOPT_HEADER=> true,
            CURLOPT_HTTPHEADER=> array('GData-Version:  2', $this->getAuthHeader(), 'Content-Type:  application/atom+xml')
        );
        return $this->calCurlRequest($url, $curlOpts);
    }

    public function calDeleteRequest($url) {
        $curlOpts = array(
            CURLOPT_CUSTOMREQUEST=> "DELETE",
            CURLOPT_HTTPHEADER=> array('GData-Version:  2', $this->getAuthHeader(), 'If-Match:  *')
        );
        return $this->calCurlRequest($url, $curlOpts);
    }

    //This is a generic function for doing curl requests
    //It expects a url and an array of CURLOPT values.  Certain defaults are set if not provided
    private function calCurlRequest($url, $curlOpts) {
        if(!array_key_exists(CURLOPT_FOLLOWLOCATION, $curlOpts)) {
            $curlOpts[CURLOPT_FOLLOWLOCATION] = true;
        }
        if(!array_key_exists(CURLOPT_RETURNTRANSFER, $curlOpts)) {
            $curlOpts[CURLOPT_RETURNTRANSFER] = true;
        }
        if(!array_key_exists(CURLOPT_HEADER, $curlOpts)) {
            $curlOpts[CURLOPT_HEADER] = false;
        }
        if(!array_key_exists(CURLOPT_HTTPHEADER, $curlOpts)) {
            $curlOpts[CURLOPT_HTTPHEADER] = array('GData-Version:  2', $this->getAuthHeader());
        }

        $ch = curl_init($url);
        curl_setopt_array($ch, $curlOpts);
        $ret = curl_exec($ch);
        curl_close($ch);

        return $ret;
    }

    public function __get($name) {
        return $this->data[$name];
    }

    public function __set($name, $val) {
        $this->data[$name] = $val;
    }
}

Here’s a simple example that adds an entry. The GoogleClientLogin class can be found here:

define("APP_NAME", "MY APP");
$email = "my.account@gmail.com";
$password = "mypassword";
$altEmail = "this_is_the_random_email_from_the_calendar_settings";
$login = new GoogleClientLogin($email, $password, GoogleClientLogin::$CALENDAR_SERVICE, APP_NAME);

$cal = new GoogleCalendar($login);
$cal->altEmail = $altEmail;

$entryData = $cal->addEvent(array(
                "title"=> "Auto Test event",
                "content"=> "This is a test event",
                "where"=> "Test location",
                "startTime"=> time()+60*60*24*1
            ));
print_r($entryData);

Converting an RFC 3339 date to a Python timestamp
(plus an update to my Google Docs backup script)

I’ve been working on updating the script to backup my Google docs.

One of the biggest issues with the current version is that it’s dumb and always downloads every single file, whether that file has changed or not. The nightly download is getting slower and slower so I figured it was time to make the script a bit smarter.

It turns out that the feed containing the document list has an <updated> property which is, obviously, that last date/time the file was updated. Pretty handy, huh?

The problem is that date/time stamp was in a format I’d never seen before:

2009-01-26T01:47:26.036Z

What?

After some digging, I found the Entry Update Date mentioned in the Protocol Reference which helpfully informed me that the Date is in RFC 3339 format.

Again, what?

A little more digging lead me to the fact that RFC3339 is the date format used in ATOM feeds, which makes sense since that’s exactly what the Google Docs document list is. The format itself is pretty straightfoward. The T separates the date and time portion and the Z is used to specify a numeric time zone offset. In this case, no Z value is provided so we know we’re dealing with GMT.

My next problem was how best to turn this RFC3339 date into a Python timestamp. My first instinct was to hack together a quick regex and be done with it but, in the end, I decided check with StackOverflow to see if there was a better way to do it.

And I’m glad I checked because I was immediately pointed to PyFeed by Steven Hastings which, in addition to a handy library for parsing ATOM feeds, includes a set of functions for manipulating RCF3339 dates. In particular, the tf_from_timestamp(ts_string) function which takes an RFC3339 string and returns a Python timestamp.

The current version of Doxworker (thanks to Ben for the name, fixing the spreadsheet downloads, and folder support) can be downloaded here. You’ll need to modify doxworker.cfg and you may also need to modify main.py depending on the location of your doxworker.cfg file.

Backing up your Google Docs

I’ve gradually become a big fan of Google Docs over the last few months. My process for writing most of these posts has been to write the first drafts on Google Docs, then copy things over to WordPress and clean up any lingering formatting issues. I still wouldn’t consider it to be a full-fledged replacement for Microsoft Office (which is also how I feel about Open Office, a discussion for another day) but it meets my basic needs.

Being to able to access everything from any computer is also a nice improvement to my old process. I used to compose posts in gVim, keep the revisions in Subversion, and then copy over to WordPress. This ended up being pretty cumbersome when switching from machine to machine.

The biggest problem I have with Google Docs is that I don’t entirely trust it yet. It’s a black blox and the inner workings are only visible to a third-party. What would happen if Google Docs blew up or they decided to start charging for it or there was just some random glitch and all of my data disappeared? I don’t ever want to be in a situation where the only copy of my data is in a place that I don’t have full control over.

The easiest way to take care of that is to always have a copy of my data somewhere else, ideally back at my house where it can be included in my existing offsite backup strategy. Google Docs does have the ability to save a document as a file to a variety of formats (just right-click on it in the items list) but, as we all should know by now, manual backups don’t work. I want something that will just sit in the background and download all of my Google Docs stuff automatically without ever having to think about it.

And it turns out that it’s pretty easy to write something to do just that, thanks to the Google Documents List Data API. All of the code snippets are in Python (because I need some Python practice) but it all boils down to calling different URLs so a PHP version should be pretty easy to come up with.

Step 1: Authentication

I’ve talked about authenticating against a Google account using PHP. Here’s a simple little snippet to get the Auth code using Python instead:

def getAuthInfo( email, password, source, service = 'writely', accountType = 'GOOGLE'):
    loginUrl = 'https://www.google.com/accounts/ClientLogin'

    loginData = {
            'accountType': accountType,
            'Email': email,
            'Passwd': password,
            'service': service,
            'source': source,
            'session': 1
            }

    req = urllib2.Request( loginUrl , urllib.urlencode(loginData))
    res = urllib2.urlopen(req)

    data = res.read()

    authInfo = {}

    for item in data.split():
        fields = item.split('=')
        authInfo[fields[0]] = fields[1]

    return authInfo

The service parameter tells Google which service you’re authenticating against. The default here is writely which is the Document List service. The source parameter is just a string uniquely identifying your app. The return value is a dictionary of the different response values. The one you’re generally interested in is Auth.

Step 2: Getting the list of documents

You can retrieve a list of documents by sending an authenticated request to

http://docs.google.com/feeds/documents/private/full

A request is authenticated by including the auth value retrieved in Step 1 as part of the HTTP header. The actual format of the header looks like this:

Authorization: GoogleLogin auth=AuthValue

and here’s the Python code for actually retrieving the list:

def getDocList(auth):
    docListUrl = 'https://docs.google.com/feeds/documents/private/full'

    header = {'Authorization': 'GoogleLogin auth=' + auth}

    req = urllib2.Request( docListUrl, None, header)
    res = urllib2.urlopen(req)

    data = res.read()

    return data

You can see how the HTTP header is just a Python dictionary that we pass to the URL.

The response to this API call is a big block of XML containing a list of entry elements for each document. An <entry> element looks like this:


  
  
    test.user
    test.user@gmail.com
  
  
  
  http://docs.google.com/feeds/documents/private/full/document%3Adocument_id


  Test Document
  2007-07-03T18:02:50.338Z

The label attribute of the first <category> child element contains the type of the document. This is important because different URLs are required for downloading Docs vs. Presentations vs. Spreadsheets.

This is a simple function for retrieving the document list:

def getDocList(auth):
    docListUrl = 'https://docs.google.com/feeds/documents/private/full'

    header = {'Authorization': 'GoogleLogin auth=' + auth}

    req = urllib2.Request( docListUrl, None, header)
    res = urllib2.urlopen(req)

    data = res.read()

    return data

This code snippet that shows authenticating, getting the document list, and looping over each entry in the list. This version is extracting the document id from the URL in the <id> element. I need to check to see if the gd:etag attribute of the entry is also the document id because that would be a much cleaner way of getting the id.

This code also grabs the document type (document, presentation, or spreadsheet) and takes the document title and cleans it up so it’s suitable to use as an output filename.

    authInfo = getAuthInfo( 'username@gmail.com', 'password', 'My Backup Script', 'writely')
    docListXML = getDocList( authInfo['Auth'])

    docList = minidom.parseString(docListXML)

    for entry in docList.getElementsByTagName('entry'):
        ids = entry.getElementsByTagName('id')
        categories = entry.getElementsByTagName('category')
        titles = entry.getElementsByTagName('title')

        docIDLink = ids[0].firstChild.nodeValue
        fields = docIDLink.split('%3A')
        docID = fields[-1]
        title = titles[0].firstChild.nodeValue
        cleanTitle = re.sub('[^aA-zZ0-9 ]', '', title)
        categoryLabel = categories[0].attributes['label'].value

        downloadDoc(docID, categoryLabel, cleanTitle)

Step 3: Downloading each document

There are three different URLs to use for the three different types of documents.

  • Documents:
    http://docs.google.com/feeds/download/documents/Export?docID=example_document_id&exportFormat=example_format
  • Presentations:
    http://docs.google.com/feeds/download/presentations/Export?docID=example_document_id&exportFormat=example_format
  • Spreadsheets:
    http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=example_spreadsheet_id&fmcmd=example_format

There are also a number of different export formats for each type of document.

Actually downloading the file is just a matter of sending an authenticated GET request to the appropriate URL and here’s a function to do it:


def downloadDoc(docID, categoryLabel, cleanTitle, auth):
    if categoryLabel == 'document':
        docUrl = 'http://docs.google.com/feeds/download/documents/Export?docID=' + docID + '&exportFormat=doc'
        ext = '.doc'
    elif categoryLabel == 'presentation':
        docUrl = 'http://docs.google.com/feeds/download/presentations/Export?docID=' + docID + '&exportFormat=ppt'
        ext = '.ppt'
    elif categoryLabel == 'spreadsheet':
        docUrl = 'http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=' + docID + '&fmcmd=4'
        ext = '.xls'

        print "Warning:  Downloading spreadsheets doesn't work yet."
        return
    else:
        print 'Error:  Unknown category label (' + categoryLabel + ')'

    filename = cleanTitle + ext

    print 'Saving "' + filename + '"'

    header = {'Authorization': 'GoogleLogin auth=' + auth}

    req = urllib2.Request( docUrl, None, header)
    res = urllib2.urlopen(req)

    file = open(filename, 'w')
    file.write(res.read())
    file.close()

You just pass the function a document id, categoryLabel (document, presentation, or spreadsheet), and an output filename. This version defaults to Microsoft Word format for documents, Powerpoint for presentations, and Excel for spreadsheets.

This gives us the basic pieces for writing a very simple script to download all of the files from a Google Documents account. I’ve got it setup to run as a nightly cronjob. It’s not what I would consider “production-quality” code but it’s more than enough to give me some peace of mind.

Wishlist for the next version

  • Download spreadsheets
  • Get folder list from Google Docs and put downloaded files into appropriate folders
  • Don’t download files that haven’t changed since the last download