Category Archives: python

Resolving problems with PIP after Upgrading to OS X El Capitan

After I upgraded my Mac to El Capitan, I was having some problems installing new packages. I was getting access denied errors when some packages tried to upgrade (and hence remove) existing packages.

For those not in virtualenvs, I had package installed to the default Python site packages directory (/Library/Python/2.7/site-packages in my case). This was causing problems because El Capitan included a new feature called System Integrity Protection (also called rootless) that prevents you (even as root via sudo) from modifying files in a number of system directories, which seemed to be affecting this.

Below are the steps I took to resolve the issue, which is a general outline for how you can resolve this issue for yourself:

  1. Capture a list of all packages you have installed. Use pip freeze > some-file-to-keep-results
  2. Disable System Integrity Protection, which involves rebooting into recovery mode (hold Command+R), launch a terminal, use the command csrutil disable and reboot back into normal mode.
  3. Uninstall all packages from pip. Use pip freeze | xargs sudo pip uninstall -y or uninstall the package manually.
  4. Ensure that all the packages in the system site-packages directory are gone (/Library/Python/2.7/site-packages), remove any remaining packages manually.
  5. Re-enable System Integrity Protection using the same procedure as #2, with the csrutil enable command
  6. Once again rebooted in normal mode again, install a version of python that’s not the one that comes with OS X. brew install python will do that if you have the Homebrew package manager installed. This is better for development uses for Python anyway.
  7. Install pip manually by downloading the get-pip.py file from the link, and running it with python get-pip.py. You can also install pip via Homebrew, but there are some reasons to do it the manual way.
  8. Finally, and this might not be required in your case, pip still wasn’t available via the shell, so I needed to manually create a command to invoke it. I created a script pip in /usr/local/bin and made it invoke the pip package:
    #!/bin/bash
    python /usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pip/__main__.py $@
    

    Finally, I modified the script to executable with chmod uga+x pip.

  9. After pip is back in place and working, I re-installed the packages I previously had with pip install -r some-file-to-keep-results

    That’s it. Hopefully its at least that easy for you.

RESTful App Engine

Continuing with my trend of posting presentations I gave a while ago, last year at Twin Cities DevFest I gave a presentation about building RESTful JSON services on Google App Engine.

The presentation is designed to both explain the ideas of REST, including the following topics:

  • How REST differs from RPC-style APIs
  • The pros and cons of JSON versus XML
  • What HTTP verbs are appropriate for which operations, including PATCH witch is seen less often in the wild
  • What HTTP status codes should be used for which scenarios
  • Tools to use when developing RESTful APIs
  • Python Code examples implementing the same API in straight Webapp2, Google Cloud Endpoints, and Webapp2 + Pytracts

In the talk I introduce my JSON serialization library Pytracts (which was called ProtoPy at the time of the presentation).

Slides here and video of the presentation here. I’m planning on recording a screencast of the presentation so the audio quality is a bit better.

Data Migration on the App Engine Datastore

I gave a presentation a couple years ago at the Twin Cities DevFest conference and I’ve been meaning to post the slides.

The gist of the talk is that with web frameworks like Rails and Django, data migration is a feature of the data model tools. With App Engine Datastore (now Cloud Datastore) you have to do the work yourself. In the talk I give Python examples of how to update the NDB models, how to use deferred tasks and mapper/mapreduce jobs to update existing entities.

The slides are here:

http://documents.morlok.net/data-migration-on-the-appengine-datastore

I’m hoping to record myself giving the presentation soon.

Cleaning out old data from Google App Engine map reduce

If you’re on Google App Engine and you are looking for a way to do some work over a large set of data in the datastore, there’s a good chance you’ll turn to App Engine Mapreduce. Unfortunately the UI for this tool leaves something (much) to be desired.

The control screen looks something like this after you’ve run a few jobs, especially if you are running pipelines that have a lot of sub-pipelines. All of this is a pain to clean up, as you have to click cleanup next to each entry, and it even annoyingly prompts you with a dialog for each one.

To resolve this issue, you can just delete the data in the datastore directly. Below is a code snippet which you can run through some sort of endpoint to delete the old data:

from google.appengine.ext import ndb

def do_cleanup():
    class _AE_Barrier_Index(ndb.Expando):
        pass

    class _AE_MR_MapreduceState(ndb.Expando):
        pass

    class _AE_MR_ShardState(ndb.Expando):
        pass

    class _AE_MR_TaskPayload(ndb.Expando):
        pass

    class _AE_Pipeline_Record(ndb.Expando):
        pass

    class _AE_Pipeline_Slot(ndb.Expando):
        pass

    class _AE_Pipeline_Status(ndb.Expando):
        pass

    class _AE_MR_MapreduceControl(ndb.Expando):
        pass

    class _AE_Pipeline_Barrier(ndb.Expando):
        pass

    to_delete_entities = [
        _AE_Barrier_Index,
        _AE_MR_MapreduceState,
        _AE_MR_ShardState,
        _AE_MR_TaskPayload,
        _AE_Pipeline_Record,
        _AE_Pipeline_Slot,
        _AE_Pipeline_Status,
        _AE_MR_MapreduceControl,
        _AE_Pipeline_Barrier
    ]

    for cls in to_delete_entities:
        for k in cls.query().fetch(keys_only=True):
            k.delete()

The function defines expando versions of the models the mapreduce library uses so that you don’t have to worry about crazy imports, and then just goes through and deletes all the entities for each type.

Connecting to the Marketo SOAP API using Python and suds

Marketo has a SOAP-based API that allows you to interact with a lot of their data, and while there is a Python library that supports it, the library doesn’t cover nearly all the methods supported in the API.

suds is a Python SOAP library that will read WSDL and provide methods for calling into the service.

Here is an example of making a call to the getLead(…):

 
import hmac
import hashlib
import datetime
import time
from suds.client import Client
 
def _utc_offset(date, use_system_timezone):
    if isinstance(date, datetime.datetime) and date.tzinfo is not None:
        return _timedelta_to_seconds(date.dst() or date.utcoffset())
    elif use_system_timezone:
        if date.year < 1970:
            # We use 1972 because 1970 doesn't have a leap day (feb 29)
            t = time.mktime(date.replace(year=1972).timetuple())
        else:
            t = time.mktime(date.timetuple())
        if time.localtime(t).tm_isdst: # pragma: no cover
            return -time.altzone
        else:
            return -time.timezone
    else:
        return 0
        
def rfc3339(date, utc=False, use_system_timezone=True):
    # Try to convert timestamp to datetime
    try:
        if use_system_timezone:
            date = datetime.datetime.fromtimestamp(date)
        else:
            date = datetime.datetime.utcfromtimestamp(date)
    except TypeError:
        pass
 
    if not isinstance(date, datetime.date):
        raise TypeError('Expected timestamp or date object. Got %r.' %
                        type(date))
 
    if not isinstance(date, datetime.datetime):
        date = datetime.datetime(*date.timetuple()[:3])
    utc_offset = _utc_offset(date, use_system_timezone)
    if utc:
        return _string(date + datetime.timedelta(seconds=utc_offset), 'Z')
    else:
        return _string(date, _timezone(utc_offset))
 
def _string(d, timezone):
    return ('%04d-%02d-%02dT%02d:%02d:%02d%s' %
            (d.year, d.month, d.day, d.hour, d.minute, d.second, timezone))
            
def sign(message, encryption_key):
    digest = hmac.new(encryption_key, message, hashlib.sha1)
    return digest.hexdigest().lower()
 
def set_header(client, user_id, encryption_key):
    h = client.factory.create('AuthenticationHeaderInfo')
    h.mktowsUserId = user_id
    h.requestTimestamp = rfc3339(datetime.datetime.now())
    h.requestSignature = sign(h.requestTimestamp + user_id, encryption_key)
    client.set_options(soapheaders=h)
 
url = 'pointer to your api service here?WSDL'
client = Client(url)
 
set_header(client, 'your username here', 'your secret key here')
 
leadKey = client.factory.create('LeadKey')
leadKeyRef = client.factory.create('LeadKeyRef')
 
leadKey.keyType = leadKeyRef.EMAIL
leadKey.keyValue = 'bob@dole.com'
 
print client.service.getLead(leadKey=leadKey)

Note that some of the above code is copied from/based on the aforementioned marketo-python library