Chief trainer for Mutable Minds.
Certified Public Accountant, Retired
Python guru. Alloy & TLA⁺ enthusiast.
Aspiring pianist. Former pilot.
Born at 320 ppm CO₂.
#Python
tip: If you were to devote a little effort into mastering just one itertool, it should be islice().
Its motivation was to be an analogue of sequence slicing but with next-in-a-loop as the foundation rather than __getitem__() and __len__().
#Python
tip: If your data is in a dictionary, perhaps from a JSON file, then str.format_map() is your new best friend.
Pretty way:
print('{name} works at {status} {company}'.format_map(info))
Less pretty:
print(f'{info["name"]} works at {info["status"]} {info["company"]}')
Woohoo!
#Python
3.8 was just released. And on the same day that PyPI hit 200,000 packages :-)
This one really is the best Python ever. Download it now.
#Python
3.8 news: F-strings now have a "=" specifier that prints an expression and its result. Very handy for diagnostic prints:
>>> print(f'{alpha=} {beta=} {betavariate(alpha, beta)=:.3f}')
alpha=3 beta=7 betavariate(alpha, beta)=0.256
#Python
2.7 reaches end-of-life at the end of 2019. Afterwards, the interpreter will not get updates -- no bugfixes, no security fixes, nothing.
In the interim, the Python ecosystem is abandoning 2.7 support.
Don't wait to switch to Python 3.
Computer experts on TV and Movies seem to magically know things. You don't see them reading books, attending conferences, contributing to open source, debugging code, taking classes, working collaboratively, refactoring, or using StackOverflow.
Writing production
#python
code:
* Simplest thing that works
* Consider type signatures
* Add repr, copy, and pickle support.
* Watch out for circular references
* Think hard about concurrency
* Handle exotic and unanticipated use cases
* Learn from the experience
* Repeat
#Python
tip: Regular expressions can be challenging to learn, to read, and to debug. My friend
@r1chardj0n3s
wrote a parse module that is much easier to work with.
It's suitable for everyday parsing tasks when you don't need the full power of regexes:
Over time, the
#python
world has shown increasing preference for double quotes: "hello" versus 'hello'.
Perhaps, this is due to the persistent influence of JSON, PyCharm, Black, and plain English.
In contrast, the interpreter itself prefers single quotes:
>>> "hello"
'hello'
#Python
factlet: All objects are born true. They have to learn to be false either with a __bool__() method that returns False or a __len__() method that returns 0.
Today's super-easy
#python
student question.
Q. What are the best practices for modifying a list while looping over it?
A. Don't.
Seriously, just make a new list and avoid hard-to-read code with hard-to-find bugs.
#python
threading tip: To loop over a dictionary in isolation from other threads, prefer:
for k, v in d.copy().items():
...
instead of:
for k, v in list(d.items()):
...
The copy() is atomic and avoids:
RuntimeError: dictionary changed size during iteration
#Python
tip: Slice's are objects.
You can store them in variables or containers just as you would with a regular integer index.
>>> s = slice(6, 12)
>>> title = "Monty python's flying circus"
>>> title[s]
'python'
#python
tip: It is easy to capture the output of functions that print to the terminal:
with open('help.txt', 'w') as f:
with contextlib.redirect_stdout(f):
help(pow)
#Python
optimization news: Inline caching has been a huge success. In 3.9, access to builtins and globals had sped-up considerably. In 3.10, regular attribute access and access to __slots__ are also faster. Most everyday Python programs will benefit. This is a huge win.
#python
tip: list.remove() deletes the first match and then shifts all subsequent data one position to the left. In a loop, that gives quadratic behavior:
There's a better way to remove all matches:
data[:] = [elem for elem in data if elem != target]
If every core developer had to a periodically teach a series of Python courses to experienced engineers, perhaps
#python
development would become more user centric and there would be less of race to add arguably unnecessary complexity to the language.
Welp, after one year of lockdown, today was the day it finally happened. I left my microphone unmuted during a 10 minute break in a
#Python
training course.
So now a whole class heard me sing Willie Nelson's, ”Good hearted woman” to my wife. 😲
#lockdownfail
#hotmike
Thought for the day: The Python language, standard library, and ecosystem are so large that all users, even core developers and book authors, must content themselves with only knowing a fraction of it.
#Python
tip: The list.insert() method can add elements one at a time, but slicing can more clearly and efficiently insert multiple elements at a time.
>>> s = ['three', 'four', 'five']
>>> s[0:0] = ['one', 'two']
>>> s
['one', 'two', 'three', 'four', 'five']
#Python
3.8 news: The typing module now makes it possible to annotate non-homogenous dictionaries:
>>> class Person(TypedDict):
name : str
age : int
>>> p = Person(name='matthew', age=10)
>>> p
{'name': 'matthew', 'age': 10}
My advice on learning new
#Python
features: Don't approach it from the "I like it" or "I don't like it" point of view. Once approved, it is just a fact of life, neither good nor bad.
Learn it deeply and see where it fits or doesn't fit in your life. Then use as needed. 🙂
Another interesting use of
#python
's walrus operator is to compute differences between successive values in a data stream. This is the inverse of accumulate():
>>> data = [10, 14, 34, 49, 70, 77]
>>> prev = 0; [-prev + (prev := x) for x in data]
[10, 4, 20, 15, 21, 7]
#Python
tip: The any() and all() builtins have short-circuiting behavior, but that is lost if you use a list comprehension:
all( for interface in interfaces) # Good
all([ for interface in interfaces]) # Not so good
#python
3.8 news: The second alpha release out today. Please try it out.
One major feature that we've needed for a long time is shared memory for multiprocessing. Our story for multi-core just got a lot better ;-)
Thank you Davin Potts!
The
#python
walrus operator can be used to accumulate data:
>>> data = [10, 4, 20, 15, 21, 7]
>>> c=0; [(c := c + x) for x in data]
[10, 14, 34, 49, 70, 77]
How long before someone proposes in-place variants?
[(c :+= x) for x in data]
#Python
tip: Don't unnecessarily prefix function names with "get".
In the sys module, getsizeof() should have been named sizeof(). The "get" is useless baggage.
#python
tip: itertools.repeat() is faster than range() for looping a fixed number of times when you don't need the loop variable.
min(random() for i in range(10_000)) # 1.03 msec per loop
min(random() for _ in repeat(None, 10_000)) # 841 usec per loop
#Python
assignment targets are remarkably general, even in cases where they don't make much sense.
>>> s = [10, 20]
>>> for s[0] in range(5):
... print(s)
...
[0, 20]
[1, 20]
[2, 20]
[3, 20]
[4, 20]
#python
tip: Now that regular dicts retain insertion order, the fastest way to eliminate hashable duplicates while retaining order is:
>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
Relatively unknown
#Python
function that everyone needs sooner or later: textwrap.dedent().
This gem removes the common leading whitespace from indented multiline strings.
#Python
news: Tomorrow, we'll hit 200,000 packages in the
#PyPI
the package index. We have a rich ecosystem that doubled in size in a little over 2½ years.
The 100,000 mark was on March 4, 2017:
#Python
news: The video is up for my
#PyConEstonia
keynote:
"Object Oriented Programming from scratch (four times)"
Hope this expands your world view.
Enjoy!
#Python
3.8 news: We now have a Euclidean distance function in the math math module :-)
Now, k-nearest neighbors boils down to:
nsmallest(k, training_data, partial(dist, new_point))
Today is my first day to teach
#Python
with version 3.11.
The improved error messages are a real joy.
When the final release happens, likely later today, do yourself a favor and upgrade immediately.
It is mostly better, faster, and stronger but doesn't cost $6 million dollars.
#Python
tip: Database queries are especially fast if you copy the database into RAM:
import sqlite3
source = sqlite3.connect('main_database.db')
dest = sqlite3.connect(':memory:')
source.backup(dest)
#Python
tip: Adding __slots__ to a class is the easiest optimization you will ever do.
* Faster instantiation: c=C()
* Faster attribute reads and writes: c.x=1
* Smaller object size
* Makes compiled language people happy
(Java and C++ classes declare attributes up front)
#python
tip: __slots__ is underappreciated
* Declares attributes upfront for readability
* Saves significant space
* Speeds up attribute access
* Detects misspelled assignments
It has a somewhat high payoff for so little work invested
#Python
news: It was always awkward to write a type annotation for methods that returned self (an instance of the current class). As of yesterday, typing.Self was added to make this much easier and more readable.
It is a big win.
1/
#Python
news: Guido accepted PEP 572. Python now has assignment expressions.
if (match := (data)) is not None:
print((1))
filtered_data = [y for x in data if (y := f(x)) is not None]
#python
list comparison techniques:
# identity matters
a is b
# order matters; duplicates matter
a == b
# order ignored; duplicates matter
Counter(a) == Counter(b)
# both order and duplicates ignored
set(a) == set(b)
#Python
floating point ninja tip: Summation accuracy is improved when terms are arranged in order of increasing magnitude.
Instead of:
giant + large + medium + small + tiny
Write:
tiny + small + medium + large + giant
Here's the video for my
#PyBay2019
keynote, "The Mental Game of Python: 10 strategies for managing complexity."
Skip the intro. The meat starts at 2:40. There is only 1 slide with all 10 techniques. The rest is live coding with non-trivial examples.
#python
tip: How do you combine a list of lists into a single set?
>>> lol = [['a', 'b', 'c'], ['b', 'c', 'd'], ['d', 'e']]
>>> set().union(*lol)
set(['a', 'c', 'b', 'e', 'd'])
#Python
tip: Counters can easily be converted to and from regular dictionaries:
>>> c = Counter(a=10, b=5)
>>> d = dict(c)
>>> d
{'a': 10, 'b': 5}
>>> Counter(d)
Counter({'a': 10, 'b': 5})
Both directions use fast C code that copies the hash tables and updates ref counts.
According to
#Python
's datetime module, I am 20,454 days old today.
My birthday plans include sheltering in place, eating a chocolate cupcake, and making an almost trivial commit to the random module.
Here's a PDF for my
#Python
#PyConIT2022
talk: Structural Pattern Matching in the Real World: New tooling, real code, problems solved.
This is intermediate and advanced level Structural Pattern Matching.
tl;dr The “good stuff” is in section 1.2
#Python
tip: None and NotImplemented are singletons.
PEP 8 advises that comparisons for singletons should always be done with "is" or "is not", never the equality operators.
Am thinking of organizing a
#StackOverflow
conference.
Any time two people start talking about something interesting, I'll have moderators break-up their group because similar conversations have happened before. 😉
#Python
wish: For the sake of users without command-line experience, tools like pip like should be runnable from within a Python interactive shell.
The issue is particularly acute on Windows where finding executables and setting the path can be a challenge.
The instructions for how to successfully use
#Python
's super() has been my most successful bit of technical writing.
It has had over a half-million readers. If if it were a book, it might have been a New York Times bestseller 🥴
#Python
3.8 news: We now have a prod() function to complement the existing sum() function:
>>> prior = 0.8
>>> likelihoods = [0.625, 0.84, 0.30]
>>> (likelihoods, start=prior)
0.126
#Python
tip: Start 2024 right by mastering the itertools module.
I recommend working through the itertool recipes one at a time until you understand how each of them work.
After an hour or two, you will be an itertool boss.
#Python
tip: Hard-coded constants should use the optional underscore as a thousands separator:
>>> x = 1_234_567
Also, you can output numbers in that format:
>>> f'{x:_d}'
'1_234_567'
Or with commas:
>>> f'{x:,d}'
'1,234,567'
1/
1/
#Python
factlet: random() gives you floats in the range 0.0 ≤ X < 1.0, but not all floats in that range are possible selections.
For example, 0.05954861408025609 isn't a possible selection.
#Python
3.12 factlet: There is a new itertool called batched() for grouping data into equal sized batches (with a possible odd lot at the end).
Interestingly the corner cases of "batched(iterable, 1)" and "zip(iterable)" both give the same result.
I learned something new today :-)
Hyrum's Law:
With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.
StackOverflow is plagued by people who aggressively mark questions as duplicates.
There are many distinct questions that happen to use the same library function in their answers.
For example, I can think of many distinct and nuanced questions whose answer is itertool.product().
Coming soon in
#Python
3.10: A faster way to count bits in an integer:
>>> x = 451
>>> bin(x)
'0b111000011'
>>> bin(x).count('1') # <-- old
5
>>> x.bit_count() # <-- new
5
1/
#python
tip: collections.Counter is a subclass of dict.
As of Py3.6, dicts learned to remember insertion order.
Consequently, counters are ordered as well.
The most_common() method is like a stable sort on counts:
>>> Counter('aaabccc').most_common(2)
[('a', 3), ('c', 3)]
Four ways to write all_equal(t) in
#python
:
s[1:] == s[:-1]
min(s) == max(s) # s must be non-empty
g = itertools.groupby(s)
next(g, True) and not next(g, False)
all(map(operator.eq, s, itertools.islice(s, 1, None)))
"Returns True if all the elements are equal to each other"
Good news:
#Python
3 adoption continues to rise.
Bad news: A lot of Python 2 users don't seem to know that the end is near
(14 months left until Python 2 becomes completely unsupported).
Last month, 39.3% of downloads from PyPI were for Python 3 (excluding downloads where we don’t know what Python version they were for, like Mirrors, etc).
#Python
floating point ninja tip: Use parentheses to regroup sums to minimize accumulated round-off error.
Instead of:
a + b + c + d + e + f + g + h
Write:
((a + b) + (c + d)) + ((e + f) + (g + h))
Note, the total work is unchanged.
#python
tip: Q. How much code do you put in one statement? A. Statements should correspond to one complete thought expressible in a single sentence in plain English.
English: I'll drive if it is raining; otherwise, I'll walk:
Python: action = 'drive' if raining else 'walk'
#Python
regex tip: Alternations are tested left-to-right and the first match wins. So, put the longer matches first to avoid aliasing shorter alternatives:
>>> re.findall(r'<|>|<=|>=', '3 <= 4') # Incorrect
['<']
>>> re.findall(r'<=|>=|<|>', '3 <= 4') # Correct
['<=']