|
|
Knowledge Base/Python/General
From The Thalesians
The Zen of Python
What is the Zen of Python? To find out, enter
>>> import this
at the Python interpreter prompt. (This is an Easter egg.) You will see the following:
The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
repr() versus str()
The difference between repr() and str() in Python may not be immediately apparent.
According to the documentation, repr() returns the "official" string representation of an object. "If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned." In general, repr(o) should return a string representation of o such that the identity
o == eval(repr(o))
holds. eval() takes an (official) string representation of an object and returns a copy of that object constructed from this string representation.
On the other hand, str() returns an "informal" string representation of an object. "This differs from repr() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead." In general, this representation should be human readable. There is no requirement for the identity
o == eval(str(o))
to hold.
Let us look at a few examples.
print str("Paul's test string")
prints
Paul's test string
while
print repr("Paul's test string")
prints
"Paul's test string"
The latter is a valid Python expression, the former is not.
print str(1.0 / 3.0)
prints
0.333333333333
while
print str(1.0 / 3.0)
prints
0.33333333333333331
The latter attempts to give enough decimal figures to enable the value to be reconstructed to maximum precision.
It's a bit surprising that
print str([3, "paul's test string", 5.5, "bar", 7, 1.0 / 3.0])
and
print repr([3, "paul's test string", 5.5, "bar", 7, 1.0 / 3.0])
both print
[3, "paul's test string", 5.5, 'bar', 7, 0.33333333333333331]
on ActivePython 2.5.2.2. It looks like str() for lists is implemented by calling repr() iteratively on the elements. (Shouldn't it be calling str()?)
Finally, for user-defined classes, repr() calls the __repr()__ method, while str() calls the __str()__ method. Here is an implementation of a simple class that provides both __repr()__ and __str()__ and conforms to the requirements imposed by the documentation:
class Point: def __init__(self, x, y): self.x = x self.y = y def __eq__(self, other): if hasattr(other, "x") and hasattr(other, "y"): return (self.x == other.x) and (self.y == other.y) else: return False def __ne__(self, other): return not self.__eq__(other) def __str__(self): return "(%s, %s)" % (str(self.x), str(self.y)) def __repr__(self): return "Point(%s, %s)" % (repr(self.x), repr(self.y))
Thus
pt = Point(3, 5) print pt print str(pt) print repr(pt) print eval(str(pt)) == pt print eval(repr(pt)) == pt
prints
(3, 5) (3, 5) Point(3, 5) False True
The first two lines are identical because print calls __repr__ when passed an object as its parameter. Notice that the result of repr(pt) can be used to reconstruct the Point object with eval().
Overridable properties in Python
class Foo(object): _a = 7 def get_a(self): return self._a def set_a(self, a): self._a = a A = property(fget=get_a, fset=set_a) class Bar(Foo): _newA = 5 def get_a(self): return self._newA def set_a(self, a): self._newA = a f = Foo() print f.A b = Bar() print b.A
If Foo.get_a is overridden by Bar.get_a we would expect to see the output
7 5
But instead we see
7 7
This is because in line
A = property(fget=get_a, fset=set_a)
the binding occurs pretty early and fget, fset are bound to A.get_a and A.set_a early, for good.
However, Python enables one to create overridable properties. The following implementation does the trick:
class OProperty(object): """Based on the emulation of PyProperty_Type() in Objects/descrobject.c""" def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel self.__doc__ = doc def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError, "unreadable attribute" if self.fget.__name__ == '<lambda>' or not self.fget.__name__: return self.fget(obj) else: return getattr(obj, self.fget.__name__)() def __set__(self, obj, value): if self.fset is None: raise AttributeError, "can't set attribute" if self.fset.__name__ == '<lambda>' or not self.fset.__name__: self.fset(obj, value) else: getattr(obj, self.fset.__name__)(value) def __delete__(self, obj): if self.fdel is None: raise AttributeError, "can't delete attribute" if self.fdel.__name__ == '<lambda>' or not self.fdel.__name__: self.fdel(obj) else: getattr(obj, self.fdel.__name__)()
It was taken from the article An Overridable Alternative to the property Function in Python, where you can find the full details.
Converting a list to a dict, value to index
mylist = ["foo", "bar", "baz"] print dict([(mylist[i], i) for i in range(0, len(mylist))])
prints
{'baz': 2, 'foo': 0, 'bar': 1}
Iterating through all keys-value pairs in a dict
Very often we want to iterate through all the key-value pairs in a dict:
d = {"Name": "Paul", "Surname": "Bilokon"} for key, value in d.items(): print "%s = %s" % (key, value)
This produces
Surname = Bilokon Name = Paul
If we just want to iterate through the keys, we use
for key in d.keys(): print key
If we just want to iterate through the values, we use
for value in d.values(): print value
What happens if we use the syntax
for x in d: print x
Perhaps counterintuitively, this will iterate through the keys, not values.
Iterating through two or more lists in parallel
Use zip:
names = ["Isaac", "Carl Friedrich", "Evariste", "John"] surnames = ["Newton", "Gauss", "Galois", "von Neumann"] ages = [84, 77, 20, 53] for n, s, a in zip(names, surnames, ages): print "NAME: %s, SURNAME: %s, AGE: %d" % (n, s, a)
The result looks as follows:
NAME: Isaac, SURNAME: Newton, AGE: 84 NAME: Carl Friedrich, SURNAME: Gauss, AGE: 77 NAME: Evariste, SURNAME: Galois, AGE: 20 NAME: John, SURNAME: von Neumann, AGE: 53
zip truncates the results to the length of the shortest list:
exponents = [2, 3, 5, 7, 9] primes = [3, 7, 31, 127] print zip(exponents, primes) for e, p in zip(exponents, primes): print "2^%d - 1 ... Mersenne prime: %d" % (e, p)
[(2, 3), (3, 7), (5, 31), (7, 127)] 2^2 - 1 ... Mersenne prime: 3 2^3 - 1 ... Mersenne prime: 7 2^5 - 1 ... Mersenne prime: 31 2^7 - 1 ... Mersenne prime: 127
Alternatively, you can use map(None, exponents, primes). This will pad the shorter lists with None:
print map(None, exponents, primes) for e, p in map(None, exponents, primes): print e, p
The results are as follows:
[(2, 3), (3, 7), (5, 31), (7, 127), (9, None)] 2 3 3 7 5 31 7 127 9 None
Building a dictionary from two lists
This is easy. Use zip or map as shown above:
names = ["Isaac", "Carl Friedrich", "Evariste", "John"] ages = [84, 77, 20, 53] print dict(zip(names, ages)) exponents = [2, 3, 5, 7, 9] primes = [3, 7, 31, 127] print dict(map(None, exponents, primes))
{'Isaac': 84, 'John': 53, 'Carl Friedrich': 77, 'Evariste': 20}
{9: None, 2: 3, 3: 7, 5: 31, 7: 127}
Conditionals in list comprehensions
It is possible to use if in list comprehensions in two distinct ways. This is best illustrated by examples:
tradeSigns = [-1, 1, 1, -1, -1, -1, 1, -1] tradeDirections = ["Sell" for ts in tradeSigns if ts == -1]
This has set tradeDirections to
['Sell', 'Sell', 'Sell', 'Sell', 'Sell']
In other words, we have pre-filtered tradeSigns and ignored its elements equal to 1. Thus we skipped the 1's and obtained five elements in the resulting tradeDirections, rather than eight.
We could also do this:
tradeDirections = ["Sell" if ts == -1 else "Buy" for ts in tradeSigns]
In this case tradeDirections is set to
['Sell', 'Buy', 'Buy', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell']
perhaps in line with our original intentions. We didn't pre-filter tradeSigns and processed all its elements (this we get eight elements in the result) but chose to replace the -1's with "Sell" and the 1's with "Buy".
In each case we used if but resorted to different syntax.
Filtering one list by another
Suppose you have defined
names = ["Paul", "Alex", "John", "Simon", "Michael"] surnames = ["Smith", "Jones", "Taylor", "Williams", "Brown"]
and now you want to print out the surnames of all Pauls. This can be achieved by using list comprehensions:
print [surnames[i] for i in range(len(names)) if names[i] == "Paul"]
will produce the output
['Smith', 'Brown']
Checking if an object is a sequence or is iterable
If o is your object, you can use the following check:
if hasattr(o, "__iter__"): # ...
The following code
print hasattr(5, "__iter__") print hasattr([1, 2, 3, 4, 5], "__iter__") print hasattr([5], "__iter__") print hasattr((5), "__iter__") print hasattr((5,), "__iter__") print hasattr((3, 2), "__iter__") print hasattr("asdf", "__iter__")
prints
False True True False True True False
Implementing functors in Python
Any object with a __call()__ method may be called using the function call syntax:
class Scale(object): def __init__(self, factor): self.factor = factor def __call__(self, arg): return self.factor * arg s = Scale(2) print s(5)
The functor can have more than one argument:
import math class Pythagoras(object): def __init__(self): pass def __call__(self, arg1, arg2): return math.sqrt(arg1 * arg1 + arg2 * arg2) p = Pythagoras() print p(3, 4)
Local variables in lambda expressions
We see that x + y is calculated twice in the following lambda expression:
func1 = lambda x, y, z: (x + y + z) / (x + y - z)
Can we compute it once and make it a local variable? One solution is to use a helper lambda expression:
func2 = lambda x, y, z: (lambda sum=x + y: (sum + z) / (sum - z))()
Now both
print func1(3.0, 5.0, 7.0)
and
print func2(3.0, 5.0, 7.0)
print the same number:
15.0
Instantiating a Python object dynamically by object class name
Use eval:
def forname(modname, classname): ''' Returns a class of "classname" from module "modname". ''' module = __import__(modname) classobj = getattr(module, classname) return classobj class Foo(object): def introduction(self): print "I am FOO" class Bar(object): def introduction(self): print "I am BAR" className = "Foo" o = eval("%s()" % className) o.introduction()
This will print
I am FOO
If, on the other hand, you set className to "Bar", you will see
I am BAR
Sending the output to STDERR rather than STDOUT
Instead of
print "Hello"
use
import sys sys.stderr.write("Hello\n")
Making a path, rather than just making a directory
If we try the following
import os os.mkdir("foo/bar/baz")
while foo/bar does not exist, foo/bar/baz will never be made. Depending on the operating system, we may see something like
Traceback (most recent call last):
File "test.py", line 4, in <module>
os.mkdir("foo/bar/baz")
WindowsError: [Error 3] The system cannot find the path specified: 'foo/bar/baz'
But instead we can use
import distutils.dir_util distutils.dir_util.mkpath("foo/bar/baz")
mkpath will create baz and any missing ancestor directories. If the directory already exists, it will do nothing.
For more information on the useful module distutils see the official documentation.
Reading a text file backwards
Reading a text file backwards is a relatively common task. Let me explain first what I mean by backwards: you read the file line by line, starting from the last line and progressing towards the first.
Why would you need this? Imagine that you have a large CSV (comma separated value) with numerous records sorted in ascending order by date/time. You want to read the last N records. Using the standard text file input/output machinery you would probably end up reading the entire file, discarding all but the last N records. Extremely wasteful. Chances are you will have more than one such file.
I have written a Python module to help you: backwards_text_file.py. You can download it from the Downloads page.
Formatting exceptions and tracebacks
Sometimes you catch an exception and don't even know what it is:
try: 1 / 0 except: # What kind of exception did we catch? pass
Of course, in this case we, the code readers, known that we have ZeroDivisionError but 1 / 0 could be a much more complicated code snippet.
The bottom line is, if we catch an exception we want to know what it is (Problem 1) and we want to be able to format it nicely as a string (Problem 2) so that we can log it (for example).
Problem 1 is solved by sys.exc_info():
import sys try: 1 / 0 except: exceptionType, exceptionValue, exceptionTraceBack = sys.exc_info() print exceptionType print exceptionValue print exceptionTraceBack
The values returned by sys.exc_info() are hardly suitable for human consumption. To solve Problem 2 (pretty formatting), we rely on the traceback module:
import logging import os import string import sys import traceback def main(argv): try: 1 / 0 return 0 except: exceptionType, exceptionValue, exceptionTraceBack = sys.exc_info() exceptionLineList = traceback.format_exception_only(exceptionType, exceptionValue) # Note: In the vast majority of cases ``exceptionLineList`` will # contain a single line logging.error(string.join(exceptionLineList, "\n")) traceBackLineList = traceback.format_tb(exceptionTraceBack) for traceBackLine in traceBackLineList: logging.debug(traceBackLine) return -1 if __name__ == "__main__": logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s') sys.exit(main(sys.argv))
For more information on the various exception printing and formatting tools provided by the traceback module read this.
