If you find this page useful, please consider making a quick and secure donation (powered by PayPal) to keep it free!
 
 

Knowledge Base/Python/General

 

From The Thalesians

Jump to: navigation, search

Contents

The Zen of Python

What is the Zen of Python? To find out, enter

>>> import this

at the Python interpreter prompt. (This is an Easter egg.) You will see the following:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

repr() versus str()

The difference between repr() and str() in Python may not be immediately apparent.

According to the documentation, repr() returns the "official" string representation of an object. "If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned." In general, repr(o) should return a string representation of o such that the identity

o == eval(repr(o))

holds. eval() takes an (official) string representation of an object and returns a copy of that object constructed from this string representation.

On the other hand, str() returns an "informal" string representation of an object. "This differs from repr() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead." In general, this representation should be human readable. There is no requirement for the identity

o == eval(str(o))

to hold.

Let us look at a few examples.

print str("Paul's test string")

prints

Paul's test string

while

print repr("Paul's test string")

prints

"Paul's test string"

The latter is a valid Python expression, the former is not.

print str(1.0 / 3.0)

prints

0.333333333333

while

print str(1.0 / 3.0)

prints

0.33333333333333331

The latter attempts to give enough decimal figures to enable the value to be reconstructed to maximum precision.

It's a bit surprising that

print str([3, "paul's test string", 5.5, "bar", 7, 1.0 / 3.0])

and

print repr([3, "paul's test string", 5.5, "bar", 7, 1.0 / 3.0])

both print

[3, "paul's test string", 5.5, 'bar', 7, 0.33333333333333331]

on ActivePython 2.5.2.2. It looks like str() for lists is implemented by calling repr() iteratively on the elements. (Shouldn't it be calling str()?)

Finally, for user-defined classes, repr() calls the __repr()__ method, while str() calls the __str()__ method. Here is an implementation of a simple class that provides both __repr()__ and __str()__ and conforms to the requirements imposed by the documentation:

  1. class Point:
  2. def __init__(self, x, y):
  3. self.x = x
  4. self.y = y
  5.  
  6. def __eq__(self, other):
  7. if hasattr(other, "x") and hasattr(other, "y"):
  8. return (self.x == other.x) and (self.y == other.y)
  9. else:
  10. return False
  11.  
  12. def __ne__(self, other):
  13. return not self.__eq__(other)
  14.  
  15. def __str__(self):
  16. return "(%s, %s)" % (str(self.x), str(self.y))
  17.  
  18. def __repr__(self):
  19. return "Point(%s, %s)" % (repr(self.x), repr(self.y))

Thus

  1. pt = Point(3, 5)
  2. print pt
  3. print str(pt)
  4. print repr(pt)
  5. print eval(str(pt)) == pt
  6. print eval(repr(pt)) == pt

prints

  1. (3, 5)
  2. (3, 5)
  3. Point(3, 5)
  4. False
  5. True

The first two lines are identical because print calls __repr__ when passed an object as its parameter. Notice that the result of repr(pt) can be used to reconstruct the Point object with eval().

Overridable properties in Python

  1. class Foo(object):
  2. _a = 7
  3.  
  4. def get_a(self):
  5. return self._a
  6.  
  7. def set_a(self, a):
  8. self._a = a
  9.  
  10. A = property(fget=get_a, fset=set_a)
  11.  
  12. class Bar(Foo):
  13. _newA = 5
  14.  
  15. def get_a(self):
  16. return self._newA
  17.  
  18. def set_a(self, a):
  19. self._newA = a
  20.  
  21. f = Foo()
  22. print f.A
  23.  
  24. b = Bar()
  25. print b.A

If Foo.get_a is overridden by Bar.get_a we would expect to see the output

7
5

But instead we see

7
7

This is because in line

A = property(fget=get_a, fset=set_a)

the binding occurs pretty early and fget, fset are bound to A.get_a and A.set_a early, for good.

However, Python enables one to create overridable properties. The following implementation does the trick:

  1. class OProperty(object):
  2. """Based on the emulation of PyProperty_Type() in Objects/descrobject.c"""
  3.  
  4. def __init__(self, fget=None, fset=None, fdel=None, doc=None):
  5. self.fget = fget
  6. self.fset = fset
  7. self.fdel = fdel
  8. self.__doc__ = doc
  9.  
  10. def __get__(self, obj, objtype=None):
  11. if obj is None:
  12. return self
  13. if self.fget is None:
  14. raise AttributeError, "unreadable attribute"
  15. if self.fget.__name__ == '<lambda>' or not self.fget.__name__:
  16. return self.fget(obj)
  17. else:
  18. return getattr(obj, self.fget.__name__)()
  19.  
  20. def __set__(self, obj, value):
  21. if self.fset is None:
  22. raise AttributeError, "can't set attribute"
  23. if self.fset.__name__ == '<lambda>' or not self.fset.__name__:
  24. self.fset(obj, value)
  25. else:
  26. getattr(obj, self.fset.__name__)(value)
  27.  
  28. def __delete__(self, obj):
  29. if self.fdel is None:
  30. raise AttributeError, "can't delete attribute"
  31. if self.fdel.__name__ == '<lambda>' or not self.fdel.__name__:
  32. self.fdel(obj)
  33. else:
  34. getattr(obj, self.fdel.__name__)()

It was taken from the article An Overridable Alternative to the property Function in Python, where you can find the full details.

Converting a list to a dict, value to index

  1. mylist = ["foo", "bar", "baz"]
  2. print dict([(mylist[i], i) for i in range(0, len(mylist))])

prints

{'baz': 2, 'foo': 0, 'bar': 1}

Iterating through all keys-value pairs in a dict

Very often we want to iterate through all the key-value pairs in a dict:

  1. d = {"Name": "Paul", "Surname": "Bilokon"}
  2.  
  3. for key, value in d.items():
  4. print "%s = %s" % (key, value)

This produces

Surname = Bilokon
Name = Paul

If we just want to iterate through the keys, we use

  1. for key in d.keys():
  2. print key

If we just want to iterate through the values, we use

  1. for value in d.values():
  2. print value

What happens if we use the syntax

  1. for x in d:
  2. print x

Perhaps counterintuitively, this will iterate through the keys, not values.

Iterating through two or more lists in parallel

Use zip:

  1. names = ["Isaac", "Carl Friedrich", "Evariste", "John"]
  2. surnames = ["Newton", "Gauss", "Galois", "von Neumann"]
  3. ages = [84, 77, 20, 53]
  4.  
  5. for n, s, a in zip(names, surnames, ages):
  6. print "NAME: %s, SURNAME: %s, AGE: %d" % (n, s, a)

The result looks as follows:

NAME: Isaac, SURNAME: Newton, AGE: 84
NAME: Carl Friedrich, SURNAME: Gauss, AGE: 77
NAME: Evariste, SURNAME: Galois, AGE: 20
NAME: John, SURNAME: von Neumann, AGE: 53

zip truncates the results to the length of the shortest list:

  1. exponents = [2, 3, 5, 7, 9]
  2. primes = [3, 7, 31, 127]
  3.  
  4. print zip(exponents, primes)
  5.  
  6. for e, p in zip(exponents, primes):
  7. print "2^%d - 1 ... Mersenne prime: %d" % (e, p)
[(2, 3), (3, 7), (5, 31), (7, 127)]
2^2 - 1 ... Mersenne prime: 3
2^3 - 1 ... Mersenne prime: 7
2^5 - 1 ... Mersenne prime: 31
2^7 - 1 ... Mersenne prime: 127

Alternatively, you can use map(None, exponents, primes). This will pad the shorter lists with None:

  1. print map(None, exponents, primes)
  2.  
  3. for e, p in map(None, exponents, primes):
  4. print e, p

The results are as follows:

[(2, 3), (3, 7), (5, 31), (7, 127), (9, None)]
2 3
3 7
5 31
7 127
9 None

Building a dictionary from two lists

This is easy. Use zip or map as shown above:

  1. names = ["Isaac", "Carl Friedrich", "Evariste", "John"]
  2. ages = [84, 77, 20, 53]
  3. print dict(zip(names, ages))
  4.  
  5. exponents = [2, 3, 5, 7, 9]
  6. primes = [3, 7, 31, 127]
  7. print dict(map(None, exponents, primes))
{'Isaac': 84, 'John': 53, 'Carl Friedrich': 77, 'Evariste': 20}
{9: None, 2: 3, 3: 7, 5: 31, 7: 127}

Conditionals in list comprehensions

It is possible to use if in list comprehensions in two distinct ways. This is best illustrated by examples:

tradeSigns = [-1, 1, 1, -1, -1, -1, 1, -1]
tradeDirections = ["Sell" for ts in tradeSigns if ts == -1]

This has set tradeDirections to

['Sell', 'Sell', 'Sell', 'Sell', 'Sell']

In other words, we have pre-filtered tradeSigns and ignored its elements equal to 1. Thus we skipped the 1's and obtained five elements in the resulting tradeDirections, rather than eight.

We could also do this:

tradeDirections = ["Sell" if ts == -1 else "Buy" for ts in tradeSigns]

In this case tradeDirections is set to

['Sell', 'Buy', 'Buy', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell']

perhaps in line with our original intentions. We didn't pre-filter tradeSigns and processed all its elements (this we get eight elements in the result) but chose to replace the -1's with "Sell" and the 1's with "Buy".

In each case we used if but resorted to different syntax.

Filtering one list by another

Suppose you have defined

  1. names = ["Paul", "Alex", "John", "Simon", "Michael"]
  2. surnames = ["Smith", "Jones", "Taylor", "Williams", "Brown"]

and now you want to print out the surnames of all Pauls. This can be achieved by using list comprehensions:

print [surnames[i] for i in range(len(names)) if names[i] == "Paul"]

will produce the output

['Smith', 'Brown']

Checking if an object is a sequence or is iterable

If o is your object, you can use the following check:

if hasattr(o, "__iter__"):
    # ...

The following code

  1. print hasattr(5, "__iter__")
  2. print hasattr([1, 2, 3, 4, 5], "__iter__")
  3. print hasattr([5], "__iter__")
  4. print hasattr((5), "__iter__")
  5. print hasattr((5,), "__iter__")
  6. print hasattr((3, 2), "__iter__")
  7. print hasattr("asdf", "__iter__")

prints

False
True
True
False
True
True
False

Implementing functors in Python

Any object with a __call()__ method may be called using the function call syntax:

  1. class Scale(object):
  2. def __init__(self, factor):
  3. self.factor = factor
  4.  
  5. def __call__(self, arg):
  6. return self.factor * arg
  7.  
  8. s = Scale(2)
  9.  
  10. print s(5)

The functor can have more than one argument:

  1. import math
  2.  
  3. class Pythagoras(object):
  4. def __init__(self):
  5. pass
  6.  
  7. def __call__(self, arg1, arg2):
  8. return math.sqrt(arg1 * arg1 + arg2 * arg2)
  9.  
  10. p = Pythagoras()
  11.  
  12. print p(3, 4)

Local variables in lambda expressions

We see that x + y is calculated twice in the following lambda expression:

func1 = lambda x, y, z: (x + y + z) / (x + y - z)

Can we compute it once and make it a local variable? One solution is to use a helper lambda expression:

func2 = lambda x, y, z: (lambda sum=x + y: (sum + z) / (sum - z))()

Now both

print func1(3.0, 5.0, 7.0)

and

print func2(3.0, 5.0, 7.0)

print the same number:

15.0

Instantiating a Python object dynamically by object class name

Use eval:

  1. def forname(modname, classname):
  2. ''' Returns a class of "classname" from module "modname". '''
  3. module = __import__(modname)
  4. classobj = getattr(module, classname)
  5. return classobj
  6.  
  7. class Foo(object):
  8. def introduction(self):
  9. print "I am FOO"
  10.  
  11. class Bar(object):
  12. def introduction(self):
  13. print "I am BAR"
  14.  
  15. className = "Foo"
  16. o = eval("%s()" % className)
  17. o.introduction()

This will print

I am FOO

If, on the other hand, you set className to "Bar", you will see

I am BAR

Sending the output to STDERR rather than STDOUT

Instead of

print "Hello"

use

  1. import sys
  2.  
  3. sys.stderr.write("Hello\n")

Making a path, rather than just making a directory

If we try the following

  1. import os
  2.  
  3. os.mkdir("foo/bar/baz")

while foo/bar does not exist, foo/bar/baz will never be made. Depending on the operating system, we may see something like

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    os.mkdir("foo/bar/baz")
WindowsError: [Error 3] The system cannot find the path specified: 'foo/bar/baz'

But instead we can use

  1. import distutils.dir_util
  2.  
  3. distutils.dir_util.mkpath("foo/bar/baz")

mkpath will create baz and any missing ancestor directories. If the directory already exists, it will do nothing.

For more information on the useful module distutils see the official documentation.

Reading a text file backwards

Reading a text file backwards is a relatively common task. Let me explain first what I mean by backwards: you read the file line by line, starting from the last line and progressing towards the first.

Why would you need this? Imagine that you have a large CSV (comma separated value) with numerous records sorted in ascending order by date/time. You want to read the last N records. Using the standard text file input/output machinery you would probably end up reading the entire file, discarding all but the last N records. Extremely wasteful. Chances are you will have more than one such file.

I have written a Python module to help you: backwards_text_file.py. You can download it from the Downloads page.

Formatting exceptions and tracebacks

Sometimes you catch an exception and don't even know what it is:

  1. try:
  2. 1 / 0
  3. except:
  4. # What kind of exception did we catch?
  5. pass

Of course, in this case we, the code readers, known that we have ZeroDivisionError but 1 / 0 could be a much more complicated code snippet.

The bottom line is, if we catch an exception we want to know what it is (Problem 1) and we want to be able to format it nicely as a string (Problem 2) so that we can log it (for example).

Problem 1 is solved by sys.exc_info():

  1. import sys
  2.  
  3. try:
  4. 1 / 0
  5. except:
  6. exceptionType, exceptionValue, exceptionTraceBack = sys.exc_info()
  7. print exceptionType
  8. print exceptionValue
  9. print exceptionTraceBack

The values returned by sys.exc_info() are hardly suitable for human consumption. To solve Problem 2 (pretty formatting), we rely on the traceback module:

  1. import logging
  2. import os
  3. import string
  4. import sys
  5. import traceback
  6.  
  7. def main(argv):
  8. try:
  9. 1 / 0
  10. return 0
  11. except:
  12. exceptionType, exceptionValue, exceptionTraceBack = sys.exc_info()
  13. exceptionLineList = traceback.format_exception_only(exceptionType, exceptionValue)
  14. # Note: In the vast majority of cases ``exceptionLineList`` will
  15. # contain a single line
  16. logging.error(string.join(exceptionLineList, "\n"))
  17. traceBackLineList = traceback.format_tb(exceptionTraceBack)
  18. for traceBackLine in traceBackLineList: logging.debug(traceBackLine)
  19. return -1
  20.  
  21. if __name__ == "__main__":
  22. logging.basicConfig(level=logging.DEBUG,
  23. format='%(asctime)s %(levelname)s %(message)s')
  24. sys.exit(main(sys.argv))

For more information on the various exception printing and formatting tools provided by the traceback module read this.

 
 
Google
 
Personal tools