Google App Engine feature request form

I was referred to the feature request form for App Engine as part of a support ticket, and hadn’t seen a link to it previously. It may be useful to others, though I think it’s app-engine specific, not general to all products on the Google Cloud.

Cleaning out old data from Google App Engine map reduce

If you’re on Google App Engine and you are looking for a way to do some work over a large set of data in the datastore, there’s a good chance you’ll turn to App Engine Mapreduce. Unfortunately the UI for this tool leaves something (much) to be desired.

The control screen looks something like this after you’ve run a few jobs, especially if you are running pipelines that have a lot of sub-pipelines. All of this is a pain to clean up, as you have to click cleanup next to each entry, and it even annoyingly prompts you with a dialog for each one.

To resolve this issue, you can just delete the data in the datastore directly. Below is a code snippet which you can run through some sort of endpoint to delete the old data:

from google.appengine.ext import ndb

def do_cleanup():
    class _AE_Barrier_Index(ndb.Expando):

    class _AE_MR_MapreduceState(ndb.Expando):

    class _AE_MR_ShardState(ndb.Expando):

    class _AE_MR_TaskPayload(ndb.Expando):

    class _AE_Pipeline_Record(ndb.Expando):

    class _AE_Pipeline_Slot(ndb.Expando):

    class _AE_Pipeline_Status(ndb.Expando):

    class _AE_MR_MapreduceControl(ndb.Expando):

    class _AE_Pipeline_Barrier(ndb.Expando):

    to_delete_entities = [

    for cls in to_delete_entities:
        for k in cls.query().fetch(keys_only=True):

The function defines expando versions of the models the mapreduce library uses so that you don’t have to worry about crazy imports, and then just goes through and deletes all the entities for each type.

Unknown Publisher when Installing ClickOnce VSTO Outlook plugin signed with SHA256 Certificate

I just spent the last day fighting this issue, so I thought I’d post the problem and solution for anyone else who is fighting with it.

Docalytics is building an Outlook plugin to tracked attachments in Sales emails using VSTO (Visual Studio Tools for Office) and we are using ClickOnce for the deployment so that we can get automatic updates. Everything was going swimmingly until I was trying to test the installation. When running a copy of the installer locally the publisher was listed as “Unknown Publisher” even though we I was signing the ClickOnce manifests with a certificate from a trusted authority (COMODO RSO Code Signing CA). When trying to install it from the web, it was also behaving like the manifests weren’t signed, giving me errors like the following:

Customization URI: 
Exception: Customized functionality in this application will not work because the certificate used to sign the deployment manifest for Docalytics for Outlook or its location is not trusted. Contact your administrator for further assistance.

************** Exception Text **************
System.Security.SecurityException: Customized functionality in this application will not work because the certificate used to sign the deployment manifest for Docalytics for Outlook or its location is not trusted. Contact your administrator for further assistance.
   at Microsoft.VisualStudio.Tools.Applications.Deployment.ClickOnceAddInTrustEvaluator.VerifyTrustPromptKeyInternal(ClickOnceTrustPromptKeyValue promptKeyValue, DeploymentSignatureInformation signatureInformation, String productName)
   at Microsoft.VisualStudio.Tools.Applications.Deployment.ClickOnceAddInTrustEvaluator.VerifyTrustUsingPromptKey(Uri manifest, DeploymentSignatureInformation signatureInformation, String productName)
   at Microsoft.VisualStudio.Tools.Applications.Deployment.ClickOnceAddInDeploymentManager.VerifySecurity(ActivationContext context, Uri manifest, AddInInstallationStatus installState)
   at Microsoft.VisualStudio.Tools.Applications.Deployment.ClickOnceAddInDeploymentManager.InstallAddIn()
The Zone of the assembly that failed was:

This error was taken from the event log, but a similar (if not identical) error was in the details of the failed installation dialog.

The problem turned out to be a bug with the VSTO runtime that would classify packages signed with SHA256RSA as unknown publisher, even if the publisher was verified. The issue was resolved with the VSTO runtime version 10.0.50325 however even though I had a later version of the runtime installed on my development box (10.0.50903), I still needed to take the corrective action described in this this post describing the issue by microsoft and this other post describing the resolution in more detail. Special thanks to this StackOverflow question for helping me get to the bottom of the issue.

Library conflict between Datejs and D3

I’ve had fun over the past few days tracking down a problem where D3 Transitions weren’t working correctly. Everything looked right and I was pulling my hair out trying to figure out why the transition didn’t get invoked. Copying the code in question to a separate page (in isolation) showed that the transitions worked fine, so I figured it must be a conflict with something else on the page.

After a couple hours of deleting things from the page (it’s tough to pull things off because of the tree of dependencies) I figured out the problem was Datejs. A little googling confirmed it. What made this challenging was that there wasn’t any errors from the conflict. It just didn’t work.

I’m not clear on what the cause of the problem is (I had already lost enough time), but I ended up switching everything to moment.js. Datejs looks like it’s been dead since 2008 anyway.

Connecting to the Marketo SOAP API using Python and suds

Marketo has a SOAP-based API that allows you to interact with a lot of their data, and while there is a Python library that supports it, the library doesn’t cover nearly all the methods supported in the API.

suds is a Python SOAP library that will read WSDL and provide methods for calling into the service.

Here is an example of making a call to the getLead(…):

import hmac
import hashlib
import datetime
import time
from suds.client import Client

def _utc_offset(date, use_system_timezone):
    if isinstance(date, datetime.datetime) and date.tzinfo is not None:
        return _timedelta_to_seconds(date.dst() or date.utcoffset())
    elif use_system_timezone:
        if date.year < 1970:
            # We use 1972 because 1970 doesn't have a leap day (feb 29)
            t = time.mktime(date.replace(year=1972).timetuple())
            t = time.mktime(date.timetuple())
        if time.localtime(t).tm_isdst: # pragma: no cover
            return -time.altzone
            return -time.timezone
        return 0

def rfc3339(date, utc=False, use_system_timezone=True):
    # Try to convert timestamp to datetime
        if use_system_timezone:
            date = datetime.datetime.fromtimestamp(date)
            date = datetime.datetime.utcfromtimestamp(date)
    except TypeError:

    if not isinstance(date,
        raise TypeError('Expected timestamp or date object. Got %r.' %

    if not isinstance(date, datetime.datetime):
        date = datetime.datetime(*date.timetuple()[:3])
    utc_offset = _utc_offset(date, use_system_timezone)
    if utc:
        return _string(date + datetime.timedelta(seconds=utc_offset), 'Z')
        return _string(date, _timezone(utc_offset))

def _string(d, timezone):
    return ('%04d-%02d-%02dT%02d:%02d:%02d%s' %
            (d.year, d.month,, d.hour, d.minute, d.second, timezone))

def sign(message, encryption_key):
    digest =, message, hashlib.sha1)
    return digest.hexdigest().lower()

def set_header(client, user_id, encryption_key):
    h = client.factory.create('AuthenticationHeaderInfo')
    h.mktowsUserId = user_id
    h.requestTimestamp = rfc3339(
    h.requestSignature = sign(h.requestTimestamp + user_id, encryption_key)

url = 'pointer to your api service here?WSDL'
client = Client(url)

set_header(client, 'your username here', 'your secret key here')

leadKey = client.factory.create('LeadKey')
leadKeyRef = client.factory.create('LeadKeyRef')

leadKey.keyType = leadKeyRef.EMAIL
leadKey.keyValue = ''

print client.service.getLead(leadKey=leadKey)

Note that some of the above code is copied from/based on the aforementioned marketo-python library


pdf2htmlEX is a great library for converting PDFs to HTML5 pages that can be viewed directly in browsers and on mobile. Text from the document is extracted as selectable text in the rendered document. It’s a C++ codebase built on top of the Poppler PDF library. Check out an example generated PDF here.


MuPDF is a great command line PDF processing utility built on Ghostscript. It provides the ability to extract text, images, and fonts via its pdfextract utility and the ability to render pages of the PDF as images via its pdfdraw utility.

Install it on Mac OS X (via Homebrew):

brew install mupdf

Also available on Linux via various packaging systems and Windows via a separate installer.

PDFBox for Reading PDFs in Java

Apache has a cool library PDFBox that lets you work with PDF documents in pure Java. It allows you to generate PDF documents, read them, extract text, and generate images from the documents. I’ve been using it for thumbnail generation and it has worked well, though I haven’t done performance tests comparing it to other libraries out there.

CrocoDocs to Offer HTML5 PDF/Doc Converstion

CrocoDocs announced that they are offering HTML5 document conversion for PDFs and Word documents. Interesting because it is relevant for my startup Docalytics.

Setup and Teardown for QUnit Tests

The QUnit documentation talks about how to define setup and teardown methods, but fails to give a code example. Here is a quick one for reference:

module("Module Name", {
	setup: function() {
		// setup logic here
	teardown: function() {
		// teardown logic here

test("Some Test", function () {
	ok(true, "test code here");

Setup and teardown functions are defined on a per-module basis, via the “lifecycle” object. This object just has methods “setup” and “teardown” as shown above. Pretty simple.