I have a custom feed aggregator written in Python. To be well behaved, it keeps a cache of previous feeds it's fetched. It unpickles the cache from a file after it starts, and pickles it to a file before it exits.
Every few months it would throw this exception while it's pickling the cache to a file:
Traceback (most recent call last):
File "rss.py", line 404, in <module>
pickle.dump(cache, f)
ValueError: I/O operation on closed file.
It would only happen once. If I ran it again, no exception.
This is the pickle.dump
call on that line:
with open(cache_file, 'wb') as f: pickle.dump(cache, f)
How can the file be closed if I opened it on the line before?
Turns out, it's not. It's a different file. Here's how I figured that out.
The cache
object being pickled contained, deep in it's hierarchy, a file-like object that was closed. That was the source of the exception.
First I had to find a reproducible case and... I just waited. It took about six months until it happened reproducibly. Once I had that, I started debugging.
Debugging was frustrating at first. Every exception pointed to the pickle.dump
call, and I couldn't step into it because it's a native function written in C. Fortunately there's a pure Python version of pickle. It's only used if the native one can't be loaded, but its functions are still there with underscore prefixes.
Changing the call to this allowed me to step into pickle:
pickle._dump(cache, f)
The object being pickled when the exception was thrown looked interesting:
<_io.BytesIO object at 0x0000015373D15A80>
That's a file-like object. I wonder what it's closed
property is?
p obj.closed
True
Hello closed "file". Where did you come from?
Pickle is recursive, so I could look through the stack and see the object hierarchy. The io.BytesIO
object is inside a SAXParseException
. Where did that come from?
My feed aggregator uses feedparser. When that fails to parse a feed, it returns an object containing the exception. In my repro case, a feed would always fail to parse and return a SAXParseException
.
The aggregator then put this object into the cache, without checking if it contains an exception. Later on, pickle tries to serialize this exception and throws the ValueError.
But, why does the closed io.BytesIO
object throw that exception when it's pickled? For that, we need to look at its source.
An object can override __getstate__
to change how it's pickled. io.BytesIO
does this, and will throw a ValueError if it's internal buffer is closed (i.e. null). I'm not sure when this particular io.BytesIO
gets closed, but it's wrapped around the response body of the HTTP request that fetched the feed. That itself is a file-like object, and has probably long-since been closed.
So, the problem is that feedparser returns an object, that contains a SAXParseException
, that contains an io.BytesIO
object, that is closed, and my feed aggregator tries to pickle it.
I fixed this in two ways:
If you get a weird ValueError on a pickle.dump
call, check what you're pickling.