Unemployed

/media/images/marvelous.png

Now that that pesky gainful employment is over I can get back to the important things, working on the back end of a website which no one reads. I was let go along with hundreds of other engineers during an 'internal restructuring' and have decided to catch up on my neglected personal projects while I live from my savings for a while.

The static document generation scripts had grown over the years into a gnarled tangle of redundant functionality. It was often easier to just write a new implementation to get something done than to try to refactor what was there. Also I've been running this website for more than a decade and my python has gotten better. As a programmer it was irritating to see:

measured_response = 'hey {} you'.format(explicitive)

Rather than:

measured_response = f'hey {explicitive} you'

It was even worse to see both implementations living side by side. I wanted to remove the old code.

I use static generation for several bits of this site:

  • single page documents
  • double page documents (which has their own template with a table of contents and sidebar)
  • recipes
  • pictures

Each involved running a different script that used it's own particular method to index images, generate thumbnails, and render the markup into html.

I added yet another way of doing things when I wrote the routines behind the recipe generation. The recipe static generation scripts were written with hash files which meant I could just type

build_recipes

and it would use the existing hash files to determine if any file had to be updated. Because checking a few hashes on every directory is practically instantaneous, I could automatically discover documents that needed to be re-rendered from markup to html.

I wanted the same quick feedback for 'build_docs' and 'build_pics'. Building pictures was almost there but building the two flavors of documents was a painfully manual process.

So a couple weeks ago I started to consolidate and rewrite the static generation scripts. This involved rewriting some of the content to make everything consistent. There are only 8 different static documents but these contain 21000 files. As the static scripts matured over the years I added features like a table of contents file which let me have non-alphabetically incrementing chapters and this had to be bolted on to the older documents.

I used a lot of hash files because they were easy and fast:

def write_hash(filepath):
        ''' Update the hashfile for the given rest file
        '''
        hashpath = filepath.parent/f'.{filepath.stem}.hash'
        hashpath.write_text(get_hash(filepath))


def get_hash(filepath):
        ''' Return the hash string for a given file
        '''
        contents = filepath.read_bytes()
        m = hashlib.sha256()
        m.update(contents)
        return m.hexdigest()


def hash_changed(filepath):
        ''' Uses a hash file to determine if the rst file has changed.
        Returns a bool
        '''
        hashpath = filepath.parent/f'.{filepath.stem}.hash'
        if not hashpath.exists():
                print(f"{hashpath} doesn't exist!")
                return True

        current_hash = get_hash(filepath)
        if current_hash != hashpath.read_text():
                print(f"{filepath} hash differs from cached version")
                return True

        return False

With these generic functions I could hash the markup file, the table of contents, the 'resize' file used to indicate what size I want my thumbnails, and I even developed a super fast media file hash routine that could generate a hash from all the media in a directory. This function could process 1GB of media in 0.5sec but with 6GB of media it was too slow to run often.

The document generation routines used to require an argument indicating which document to update but now with these changes I can type: 'build_docs', 'build_recipes', or 'build_pics' and it will rebuild things as necessary.

But you can see the final result and all the work that led up to this for yourself here.. I'm happy to report the latest gitea works much better than the previous version (which required manually poking the database to get it to show my changes) and I will be hosting more of my git projects here as I go through them.

If you see broken links on any of the document pages let me know.