It was Oct 27, 2020 afternoon, one day after netbox-docker release version 0.26.0. I was remote working at home. In release note, there was no break changes or compatibility issues. I replaced docker image tag in docker-compose.yml did the upgrade: ran docker-compose pull
and docker-compose up -d
.
Well, netbox was unable to boot. The logs reported 'NoneType' object has no attribute 'lower'
. I double checked the configuration and release document, everything was fine. It seemed there were bugs in new release netbox-docker. It was totally out of my expectation. It was a minor version release, and the upgrade should be seamless.
Revert docker image back did not recover from error. So there were 3 options:
- Rollback to old version and restore from backup.
- Wait for netbox-docker team to fix.
- Fix by myself.
For option 1, the netbox was not a crucial service, it was OK to be out of service for several hours. So there were time for me to try to fix before restore from backup. For option 2, I should not rely on the netbox-docker team. It was one day since the broken version release. I had no idea whether there was someone with bad luck like me reported to the team. So I chose option 3.
Problem 1
The error 'NoneType' object has no attribute 'lower'
was easy to locate in source code, just by searching lower
keyword. The line of buggy code was to lower case one environment variable value, but the environment variable may not be defined. In that case, the value was None
and called lower
method to None
will reported the error.
I created a pull request to fix it.
Problem 2
After manually fix problem 1, another error shown: No such file or directory: '/tmp/metrics/counter_26.db'
So I searched counter_26.db
, found nothing. Tried /tmp/metrics
I found gunicorn_config.py contained it. I had no idea what this file work for. So I just set raw_env
to empty string and restarted netbox. The error disappeared.
Luckly, another guy create a pull request for it: just remove raw_env
.
Problem 3
Netbox was running this time and the UI could open in browser. But LDAP authentication was broken: it reported missing LDAP configurations. I had double check the configuration. It was at the right place with correct values.
Netbox is coded in Python, so I could change python file in docker container, add log, to do fix and verify quickly. After about 2 hours later, I found where the problem was and create another pull request.
Lesson Learn: do upgrade in test environment first.