Finding potential Yuletide offers with Python

If the formatting of this post seems odd or incomplete, try viewing the original on my site.

I’m doing the same thing for my /yuletide signup as last year, i.e. using the event as an excuse to find myself a new VN to buy and play over the course of November/December. Doing this last year got me a new blorbo, so it was successful on all fronts really.

Helpfully all the VNs in the tagset are marked with “(Visual Novel)” or something similar, so I wrote a Python script that gets them from a downloaded copy of the tagest page, then queries the VNDB API based on my inclusion criteria.

I need the following libraries: re for trimming the canon names from AO3; requests for making the API calls; BeautifulSoup for parsing the tagset HTML:

import re,requests
from bs4 import BeautifulSoup

I don’t want to have to play anything massively long as I’m on a deadline, so I’ll restrict it to VNs that are 45 hours or less.

maxhours = 45

Firstly extracting anything as a VN from the tagset and then stripping the title to just one of the piped alternatives where they exist:

with open("tagset.html","r") as tagset:
    tagsoup = BeautifulSoup(tagset,"html.parser")
    fandoms = tagsoup.find_all("li",class_="fandom")
    vns = []
    for fandom in fandoms:
        title = fandom.find("h4")
        if "Visual Novel" in title.text:
            thetitle = re.sub(" \(.*Visual Novel.*\)\n.*\n.*","",re.sub(".* \| ","",title.text.strip()))
            vns.append(thetitle)

Then looking each of these up via the API and applying a few filters using the syntax it provides: I want VNs available in English, originally in Japanese, and available on one of the platforms I have convenient access to at the moment (this doesn’t include PC, so a lot are excluded including apparently any BL VNs, unfortunately). After getting each result I exclude it if it’s over the 45 hours – this can’t be done when the initial search is performed as there’s no granular filter for length. I include the match from the tagset as part of the returned result to flag any cases where the search string has returned a false positive.

final = []

for vn in vns:
    x = requests.post("https://api.vndb.org/kana/vn",json={"filters":["and",["lang","=","en"],["search","=",vn],["olang","=","ja"],["or",["platform","=","ps4"],["platform","=","ps5"],["platform","=","swi"]]],"fields":"title, length_minutes, id"})
    try:
        for result in x.json()["results"]:
            if result["length_minutes"] < (maxhours * 60):
                final.append(result["title"] + ": https://vndb.org/" + result["id"] + " (" + vn + ")")
    except requests.exceptions.JSONDecodeError:
        pass

Then removing duplicates:

final = sorted(list(dict.fromkeys(final)))
for vn in final:
    print(vn)

Once I’d taken out the false positives, I was left with about 19 VNs, which I looked up on VNDB to find the ten that seemed most appealing. I’ll make those ten my Yuletide offer and then buy + play whichever ends up being my assignment.