Source of Literate Wordle
Below is the raw org-mode file that was rendered to Literate Programming Wordle in the previous documentation page.
This is intended for people curious about how the plain-text file behind the “novel” looks like. If you just wanted to read the novel comfortably, take a look at the previous page instead.
#+TITLE: Wordle
#+OPTIONS: ^:nil
# Shell steps should show results verbatim (not tables) and don't rerun on export
#+PROPERTY: header-args:shell :results verbatim :eval no-export
I've recently written [[https://jiby.tech/post/gherkin-features-user-requirements/][a series of blog posts about Gherkin]], the [[https://jiby.tech/post/bdd-dreams-cucumber-and-gherkin/][Behaviour-driven
development movement,]] and how Cucumber (the BDD tool of choice) failed to
perform to expectations.
I wanted to showcase the BDD-inspired low-tech solution I came up with via a
toy project, demonstrating a small but significant programming task, broken down
as series of design-implementation cycles.
Wordle is a perfect target: it's a small codebase, with a half dozen features to
string together into a useable game.
In order to document the process, the code is written via literate programming.
#+begin_quote
Literate programming is the art of writing code as if it was a novel (or blogpost), writing down what's needed, explaining the reasoning, and weaving in code snippets that add up to the codebase as we grow in understanding. The result is a "story" which can be read, but also "tangled" back into a proper codebase that works normally.
#+end_quote
For more context on the code repository (how to use, etc), please read the project
readme.
See also the online, pretty rendered version of this document on my personal
website: https://jiby.tech/project/literate_wordle/wordle.html
* Mise en bouche: picking an answer
To get us started, let's cover the very first behaviour Wordle has to do: pick a
word that will become our secret answer.
As the first iteration in a test-driven project, it's important that we set up
all the components we'll need going forwards.
First, let's formalise a little our first requirement, using Gherkin Features.
For context as to why/how we're doing this, read [[https://jiby.tech/post/gherkin-features-user-requirements/][my post on gathering
requirements via Gherkin.]]
#+NAME: feature1
#+CAPTION: Our first requirement. Tangled (exported) to =features/pick_answer_word.feature=
#+begin_src feature :tangle features/pick_answer_word.feature
Feature: Pick an answer word
As a Wordle game
I need to pick a random 5 letter word
In order to let players guess it
#+end_src
Right. That's fairly straightforward, but the secret word can't just be random
characters, it needs to be a proper word. So we need to find a dictionary to
pick from.
** TDD for picking word functionality
We want to write a test that validates that we can indeed pick a random word. But "Random" and "test" together should make anybody wince at the idea of
non-deterministic testing.
We /could/ write a test that picks a word, then confirm the word came
from the dictionary file, but writing test would mean re-implementing the entirety of the
feature we're testing, as well as rely on the internals of the implementation
being correct. That's very wrong.
A good alternative would be to pin down the randomness (making the test
deterministic) by anchoring the randomness seed to known value, allowing
repeatable testing. But this is just the first test in a new project, so we want a
simple check to start with, so we compromise by making the assertion "is the
random word picked of five letter length"?
So we write down a new test file, under =tests/= folder, starting with a
file-level docstring that references the Gherkin feature this enforces.
#+CAPTION: Test driving the first feature. Note the =pick_answer_word()= function doesn't exist yet, we're dreaming it up!
#+begin_src python :tangle tests/test_pick_word.py
"""Validates the Gherkin file features/pick_answer_word.feature:
Feature: Pick an answer word
As a Wordle game
I need to pick a random 5 letter word
In order to let players guess it
"""
from literate_wordle.words import pick_answer_word
def test_pick_word_ok_length():
"""Confirm a wordle solution is of right size"""
assert len(pick_answer_word()) == 5, "Picked wordle solution is wrong size!"
#+end_src
Of course, since that feature isn't implemented (not even the module's
skeleton), running tests right now would crash as import errors, rather than
give a red light.
So let's implement the barest hint of the =pick_answer_word= function that
returns the wrong thing, to make the test run and fail:
#+CAPTION: First, the docstring for a new python module =words.py=, under the =literate_wordle= package.
#+begin_src python :tangle no
"""Dictionary features to back wordle solutions"""
#+end_src
In that module, let's add the skeleton for our =pick_answer_word= function, but
return an invalid result, to make test explicitly fail:
#+begin_src python :tangle no
def pick_answer_word() -> str:
"""Pick a Wordle solution/answer from wordle dictionary"""
return "" # Incorrect solution to get RED test
#+end_src
With our test ready, and a dummy function in place, let's see the tests go red:
# To avoid crashing org-mode, run these tests via: make test 2>&1 || true
#+CAPTION: Running the tests.
#+begin_src shell :exports both
make test
#+end_src
#+RESULTS:
#+begin_example
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, datadir-1.3.1, clarity-1.0.1
collecting ... collected 2 items
tests/test_pick_word.py::test_pick_word_ok_length FAILED [ 50%]
tests/test_version.py::test_version PASSED [100%]
=================================== FAILURES ===================================
___________________________ test_pick_word_ok_length ___________________________
def test_pick_word_ok_length():
"""Confirm a wordle solution is of right size"""
> assert len(pick_answer_word()) == 5, "Picked wordle solution is wrong size!"
E AssertionError: Picked wordle solution is wrong size!
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E 0
E 5
E
tests/test_pick_word.py:13: AssertionError
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
=========================== short test summary info ============================
FAILED tests/test_pick_word.py::test_pick_word_ok_length - AssertionError: Pi...
========================= 1 failed, 1 passed in 0.07s ==========================
make: *** [Makefile:16: test] Error 1
#+end_example
As pytest mentions, we should see a wordle solution of 5 letters, not zero.
So the test indeed failed as expected, we can now make it pass by implementing
the feature.
Taking a quick step back, think of how conveniently TDD lets us "dream up an
API", by describing functions and files that don't need to exist yet.
** Solutions dictionary file
Since we're trying to match the Wordle website's implementation, let's reuse
Wordle's own dictionary. Someone [[https://raw.githubusercontent.com/AllValley/WordleDictionary/main/wordle_solutions_alphabetized.txt][helpfully uploaded it]]. Let's download it:
#+begin_src shell :tangle no
wget \
--output-document "wordle_answers_dict.txt" \
"https://raw.githubusercontent.com/AllValley/WordleDictionary/6f14d2f03d01c36fe66e3ccc0929394251ab139d/wordle_solutions_alphabetized.txt"
#+end_src
Except an alphabetically sorted text file takes space for no good reason. Let's
compress it preventively.
While this can legitimately be seen as a premature optimization, we can see this
as trying to "flatten" a static text file into a binary "asset" that can be
packaged into the project's package, like icons are part of webapps.
#+begin_src shell :tangle no :exports both
ANSWERS_FILE="wordle_answers_dict.txt"
# Get raw file size in kilobytes
du -k "${ANSWERS_FILE}"
# Compress the file (removes original)
gzip "$ANSWERS_FILE"
# Check size after compression
du -k "${ANSWERS_FILE}.gz"
#+end_src
#+RESULTS:
: 16 wordle_answers_dict.txt
: 8 wordle_answers_dict.txt.gz
Sweet, we have cut down the filesize by half.
** Importing dictionary: static/packaged asset file read
At first glance, the implementation of the function we want is simple, it looks
roughly like this:
#+begin_src python :tangle no
with open("my_dictionary.txt", "r") as fd:
my_text = fd.read()
#+end_src
One just needs to find the right file path to open, just add sprinkles to deal
with compression. Sure enough, that is fairly easy.
The issue is that we're trying to write a python package here, which means it could
be downloaded via =pip install= and installed in an arbitary location on
someone's computer. Our code needs to refer to the file as "the file XYZ inside
the assets folder of our package". We need to look up how to express that.
From [[https://stackoverflow.com/a/20885799][Stackoverflow on reading static files from inside Python package]], we can
use the =importlib.resources= module, since our project requires =Python 3.9= onwards.
So we'll move our dictionary zip file into a new module (folder) called
=assets=, which will be a proper python module that can be imported from:
#+CAPTION: Moving our dictionary to the new =assets= sub-module.
#+begin_src shell :tangle no
mkdir -p src/literate_wordle/assets/
# A proper python module means an __init__.py: Give it a docstring
echo '"""Static binary assets (dictionaries) required to perform Wordle"""' > src/literate_wordle/assets/__init__.py
mv wordle_answers_dict.txt.gz src/literate_wordle/
#+end_src
With the file in correct position, let's redefine the =words= module we left empty, to provide the =pick_answer_word= function.
#+CAPTION: Defining new python module under =src/literate_wordle/words.py=, starting with docstrings.
#+NAME: choice-module-docstring
#+begin_src python :tangle no
"""Dictionary features to back wordle solutions"""
#+end_src
#+NAME: choice-stdlib
#+CAPTION: Necessary imports from the standard library.
#+begin_src python :tangle no
import gzip
import importlib.resources as pkg_resources
#+end_src
#+NAME: choice-locallib
#+CAPTION: Local import of new =assets/= folder
#+begin_src python :tangle no
from . import assets # Relative import of the assets/ folder
#+end_src
We need a convenience function to load the zip file into a list of strings.
#+NAME: choice-func-unzipdict
#+CAPTION: Actual function to unzip dictionary. Note newline-delimited words now lowercased into a list-of-word strings with no trailing whitespace
#+begin_src python :tangle no
def get_words_list() -> list[str]:
"""Decompress the wordle dictionary"""
dict_compressed_bytes = pkg_resources.read_binary(
assets, "wordle_answers_dict.txt.gz"
)
dict_string = gzip.decompress(dict_compressed_bytes).decode("ascii")
answer_word_list = [word.strip().lower().strip() for word in dict_string.split("\n")]
return answer_word_list
#+end_src
Ideally we would make a test dedicated for proving this function, but our
already-failing acceptance test is pretty much covering this entire feature, so
it's not worth it just now. This is one of those tradeoffs we make between toy
projects and long-term maintainability of code as a team.
With the word list in hand, writing out the pick function is trivial:
#+CAPTION: Import from standard library for randomness
#+NAME: choice-stdlib2
#+begin_src python :tangle no
from random import choice
#+end_src
#+NAME: choice-func-pickanswer
#+CAPTION: Pick-a-word! Pretty simple, with all the legwork we did already.
#+begin_src python :tangle no
def pick_answer_word() -> str:
"""Pick a single word out of the dictionary of answers"""
return choice(get_words_list())
#+end_src
With the function implemented, we can try it out in a Python REPL (Read Eval
Print Loop, also known as an interactive interpreter):
#+CAPTION: Open an interactive python session, ask for a random word twice.
#+begin_src shell :tangle no :exports both
poetry run python3
>> from literate_wordle import words
>> print(words.pick_answer_word())
stink
>> print(words.pick_answer_word())
blank
#+end_src
Perfect! So the test should now pass, right?
#+begin_src shell :tangle no :exports both
make test
#+end_src
#+RESULTS:
#+begin_example
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, datadir-1.3.1, clarity-1.0.1
collecting ... collected 2 items
tests/test_pick_word.py::test_pick_word_ok_length PASSED [ 50%]
tests/test_version.py::test_version PASSED [100%]
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
============================== 2 passed in 0.03s ===============================
#+end_example
Acceptance tests pass, and linters are happy (not pictured, use =make= to
check).
Because the acceptance test pass, that means the feature is ready to ship!
That's the BDD guarantee.
Of course, keen readers will notice sub-optimal code, like how we're unzipping
the entire solutions file on each requested answer. Because "picking a solution
word" is something done on the order of /once/ over the /entire runtime/ of a
Wordle session, we choose to leave this performance wart be.
** Debriefing on the method
We just completed our first loop: determine a small component that needs
implemented to build towards the Wordle goal, spell it out with Gherkin features,
explicit the feature via acceptance test, and iterate on the new RED test until it becomes
green, then ship the feature.
Common TDD workflow adds a refactor or "blue" component to the cycle, which is
indeed necessary for production code, as it lends maintainability (the first
draft of a codebase is usually taking big shortcuts). But this project is
meant as entertainment material, and proper refactoring would mean refactoring the =wordle.org=
source file, which would drown out the nice narrative we're building here, so
let's leave it here.
Along the way, the code blocks spelled out in this narrative-oriented file is
tangled out into proper code paths, so that the =Makefile= can pick it up and
validate the proper package-ness. We'll see as we implement the next feature how
such a weaving of code snippets works.
* Confirming guess is a valid word
Now that we can pick secret words, we need to start processing guesses. The very
first thing we need is validating guesses are proper words, and of the right
size. This feature will give us a familiar context (dictionaries), while slowly
ramping up the details of the Gherkin features:
#+NAME: feature-check-valid-guess
#+CAPTION: New Gherkin feature file =features/checking_guess_valid_word.feature=
#+begin_src feature :tangle features/checking_guess_valid_word.feature
Feature: Checking a guess is a valid word
As a Wordle game
I need to confirm each guessed word is valid
So that I only accept real words, no kwyjibo
#+end_src
In practice, this means multiple things:
#+NAME: scenario-check-valid-guess
#+CAPTION: Scenarios to describe the feature in details
#+begin_src feature :tangle features/checking_guess_valid_word.feature
Scenario: Reject long words
When guessing "affable"
Then the guess is rejected
And reason for rejection is "Guess too long"
Scenario: Reject short words
When guessing "baby"
Then the guess is rejected
And reason for rejection is "Guess too short"
Scenario: Reject fake words via dictionary
When guessing "vbpdj"
Then the guess is rejected
And reason for rejection is "Not a word from the dictionary"
Scenario: Accept five letter dictionary words
When guessing "crane"
Then the guess is accepted
#+end_src
So, with a feature covering these scenarios, we can start laying out acceptance
tests.
Since I quite like to use the Gherkin feature file inside the
docstrings of Python tests, I'm going to take advantage of having already
written the feature above, to reference it, so I can template it out in code snippets:
#+NAME: scenario-check-tangle-noweb
#+CAPTION: New test file's module-level docstring, using (invisible during rendering) templating to fill in the gherkin feature from Listing [[feature-check-valid-guess]]
#+begin_src python :tangle tests/test_checking_guess_valid_word.py :noweb yes
"""Validates the Gherkin file features/checking_guess_valid_word.feature:
<<feature-check-valid-guess>>
"""
#+end_src
Just this once, I'll show how the templating happens behind the scene:
#+NAME: scenario-check-tangle-withoutnoweb
#+CAPTION: Same code block as Listing [[scenario-check-tangle-noweb]], but without the magic templating enabled: each block with two chevrons around references a code block from above.
#+begin_src python :tangle no
"""Validates the Gherkin file features/checking_guess_valid_word.feature:
<<feature-check-valid-guess>>
<<scenario-check-valid-guess>>
"""
#+end_src
** Test setup
With the feature described, let's import our hypothetical test code
#+NAME: test-valid-import
#+CAPTION: Import a new function we'll be defining
#+begin_src python :tangle no
from literate_wordle.words import check_valid_word
#+end_src
#+CAPTION: A simple test using the first scenario
#+NAME: test-valid-1
#+begin_src python :tangle no
def test_reject_long_words():
"""Scenario: Reject long words"""
# When guessing "affable"
guess = "affable"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Overly long guess should have been rejected"
# And reason for rejection is "Guess too long"
assert reject_reason == "Guess too long"
#+end_src
Notice the pattern of referencing the Gherkin Scenario as comments inside the
test. This practice is something I came up with on my own after being a bit
disappointed with Cucumber. You can read more about it in [[https://jiby.tech/post/low-tech-cucumber-replacement/][my post on low-tech
cucumber replacement]].
#+CAPTION: The opposite test, text too short
#+NAME: test-valid-2
#+begin_src python :tangle no
def test_reject_overly_short_words():
"""Scenario: Reject short words"""
# When guessing "baby"
guess = "baby"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Overly short guess should have been rejected"
# And reason for rejection is "Guess too short"
assert reject_reason == "Guess too short"
#+end_src
And finally, the dictionary checks:
#+CAPTION: Non-dictionary words test
#+NAME: test-valid-3
#+begin_src python :tangle no
def test_reject_nondict_words():
"""Scenario: Reject fake words via dictionary"""
# When guessing "vbpdj"
guess = "vbpdj"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Word not in dictionary should have been rejected"
# And reason for rejection is "Not a word from the dictionary"
assert reject_reason == "Not a word from the dictionary"
#+end_src
#+CAPTION: Dictionary words test
#+NAME: test-valid-4
#+begin_src python :tangle no
def test_accept_dict_words():
"""Scenario: Accept five letter dictionary words"""
# When guessing "crane"
guess = "crane"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is accepted
assert is_valid, "Correct length word in dictionary should have been accepted"
#+end_src
One tiny detail regarding this last example, which highlights why separating
Gherkin from actual code is important: We describe in the positive scenario the
need to accept a correct word in terms of "not rejecting", which in code maps to
the =is_valid= boolean. That's suffficient to validate the originalGherkin
scenario, which is what we think of when designing the software.
But as we see in the implementation, there's also the matter of the
=reject_reason= component, which we should check for emptiness. That emptiness is an
implementation detail, which has no reason to be laid out in the original
scenario, but is still valid to make assertions on as part of the
implementation's check. So we add the following line to the test:
#+NAME: reject-reason-none
#+CAPTION: Appended line to Listing [[test-valid-4]]. Doesn't map back to Gherkin, because it is an implementation detail, not part of the feature's requirement itself. Still worth checking, in practice.
#+begin_src python :tangle no
assert reject_reason is None, "Accepted word should have no reason to be rejected"
#+end_src
With all these (high level) tests in hand, let's write up some small
implementation to get RED tests instead of a crash.
First up is defining the function's signature: Simple enough, we take a string guess
in, and return a boolean and a string for justification. Except sometimes (as
seen in Listing [[reject-reason-none]]) the reason is =None=, so that's more of an
=Optional= string, which we'll need to import.
#+CAPTION: Import type hints for function type definition
#+NAME: valid-stdlib
#+begin_src python :tangle no
from typing import Optional
#+end_src
#+CAPTION: Function signature without its content
#+NAME: valid-func-proto
#+begin_src python :tangle no
def check_valid_word(guess: str) -> tuple[bool, Optional[str]]:
#+end_src
#+CAPTION: Fill the function, to give valid-but-nonsensical output
#+NAME: valid-func-junk
#+begin_src python :tangle no
"""Pretends to check if guess is a valid word"""
return False, "Not implemented"
#+end_src
All right, so we have tests, let's see them fail!
#+CAPTION: Run the tests. The =2>&1 || true= part is to ensure any failed test's output goes to stdout (in this document) and bad exit codes don't get marked as failures of the code block's execution.
#+NAME: valid-func-failrun1
#+begin_src shell :tangle no :exports both :async
make test 2>&1 || true
#+end_src
#+RESULTS:
#+begin_example
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collecting ... collected 5 items
tests/test_checking_guess_valid_word.py::test_reject_long_words FAILED [ 20%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words FAILED [ 40%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words FAILED [ 60%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words FAILED [ 80%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [100%]
=================================== FAILURES ===================================
____________________________ test_reject_long_words ____________________________
def test_reject_long_words():
"""Scenario: Reject long words"""
# When guessing "affable"
guess = "affable"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Overly long guess should have been rejected"
# And reason for rejection is "Guess too long"
> assert reject_reason == "Guess too long"
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E Not implemented
E Guess too long
E
tests/test_checking_guess_valid_word.py:39: AssertionError
________________________ test_reject_overly_short_words ________________________
def test_reject_overly_short_words():
"""Scenario: Reject short words"""
# When guessing "baby"
guess = "baby"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Overly short guess should have been rejected"
# And reason for rejection is "Guess too short"
> assert reject_reason == "Guess too short"
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E Not implemented
E Guess too short
E
tests/test_checking_guess_valid_word.py:50: AssertionError
__________________________ test_reject_nondict_words ___________________________
def test_reject_nondict_words():
"""Scenario: Reject fake words via dictionary"""
# When guessing "vbpdj"
guess = "vbpdj"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
assert not is_valid, "Word not in dictionary should have been rejected"
# And reason for rejection is "Not a word from the dictionary"
> assert reject_reason == "Not a word from the dictionary"
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E Not implemented
E Not a word from the dictionary
E
tests/test_checking_guess_valid_word.py:61: AssertionError
____________________________ test_accept_dict_words ____________________________
def test_accept_dict_words():
"""Scenario: Accept five letter dictionary words"""
# When guessing "crane"
guess = "crane"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is accepted
> assert is_valid, "Correct length word in dictionary should have been accepted"
E AssertionError: Correct length word in dictionary should have been accepted
E assert False
tests/test_checking_guess_valid_word.py:70: AssertionError
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/words.py 14 0 100%
------------------------------------------------------------
TOTAL 15 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
=========================== short test summary info ============================
FAILED tests/test_checking_guess_valid_word.py::test_reject_long_words - asse...
FAILED tests/test_checking_guess_valid_word.py::test_reject_overly_short_words
FAILED tests/test_checking_guess_valid_word.py::test_reject_nondict_words - a...
FAILED tests/test_checking_guess_valid_word.py::test_accept_dict_words - Asse...
========================= 4 failed, 1 passed in 0.13s ==========================
make: *** [Makefile:16: test] Error 1
#+end_example
Test failure as expected, and enjoy that 100% coverage![fn::Obviously coverage
metric is a very fuzzy number which doesn't guarantee much, but most well maintained code has a
tendency to have good coverage, because the features are well tested. It's a
correlation-metric, nothing more. In our case, we're doing TDD (test goes first
indeed) and we're pushing this even more to explicit our user requirements as
acceptance tests, it should be no surprise the coverage gets good.]
** Implementing the feature, one test at a time
Let's implement the proper feature. First of all, we replace the function stub's
body to do only guess-length checks, run tests against it. Since we implement
half the feature (by Scenarios), we should be seeing half as many tests fail as before.
#+NAME: valid-func-lenbody
#+begin_src python
"""Check wordle guess length only, no dict checks"""
answer_length = 5
guess_length = len(guess)
if guess_length < answer_length:
return False, "Guess too short"
if guess_length > answer_length:
return False, "Guess too long"
return True, None # No dictionary check
#+end_src
#+NAME: valid-func-failrun2
#+CAPTION: Similarly to Listing [[valid-func-failrun1]], run test without exiting on failure
#+begin_src shell :tangle no :exports both :async
make test 2>&1 || true
#+end_src
#+RESULTS:
#+begin_example
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collecting ... collected 5 items
tests/test_checking_guess_valid_word.py::test_reject_long_words PASSED [ 20%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words PASSED [ 40%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words FAILED [ 60%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words PASSED [ 80%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [100%]
=================================== FAILURES ===================================
__________________________ test_reject_nondict_words ___________________________
def test_reject_nondict_words():
"""Scenario: Reject fake words via dictionary"""
# When guessing "vbpdj"
guess = "vbpdj"
is_valid, reject_reason = check_valid_word(guess)
# Then the guess is rejected
> assert not is_valid, "Word not in dictionary should have been rejected"
E AssertionError: Word not in dictionary should have been rejected
E assert not True
tests/test_checking_guess_valid_word.py:59: AssertionError
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/words.py 19 0 100%
------------------------------------------------------------
TOTAL 20 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
=========================== short test summary info ============================
FAILED tests/test_checking_guess_valid_word.py::test_reject_nondict_words - A...
========================= 1 failed, 4 passed in 0.11s ==========================
make: *** [Makefile:16: test] Error 1
#+end_example
Progress! Four of five tests pass[fn::Since the remaining two tests we didn't
implement code for each check one of the =is_valid= boolean, it's normal that we spuriously pass
one of the remaining tests, because our dummy function returns the same boolean
answer always, and a broken clock is right twice a day.], so we now need the dictionary.
Note that in Wordle's original implementation, the list of possible solutions is
a subset of the word dictionary used for guess validation. We previously loaded
the answers, now we need the larger set of accepted words. While it does mean
there will be duplicate entries, we're talking single-digit kilobytes, we can
afford that.
We fetch the dictionary like before:
#+begin_src shell :tangle no
wget \
--output-document "src/literate_wordle/assets/wordle_accepted_words_dict.txt" \
"https://raw.githubusercontent.com/AllValley/WordleDictionary/6f14d2f03d01c36fe66e3ccc0929394251ab139d/wordle_complete_dictionary.txt"
#+end_src
#+RESULTS:
And compress it too
#+begin_src shell :tangle no :exports both
ANSWERS_FILE="src/literate_wordle/assets/wordle_accepted_words_dict.txt"
du -k "${ANSWERS_FILE}"
gzip "$ANSWERS_FILE"
du -k "${ANSWERS_FILE}.gz"
#+end_src
#+RESULTS:
: 92 src/literate_wordle/assets/wordle_accepted_words_dict.txt
: 36 src/literate_wordle/assets/wordle_accepted_words_dict.txt.gz
This time is more like two thirds shaved off, sweet.
We reach to add a function for decompressing, but realize we wrote all this
before, except for a different filename. So let's edit the zip extraction code
to be more generic.
One way this can be more generic is returning a =set= of strings, instead of the
previous =list=. This means we assume no ordering and use hash addressing,
rather than strict string ordering. After all, we will not iterate through the
list, as much as we want to randomly access entries, so the =set= will provide
benefits down the line.
#+NAME: choice-func-unzipdict-generic1
#+CAPTION: Generic "unzip asset" function
#+begin_src python :tangle no
def get_asset_zip_as_set(asset_filename: str) -> set[str]:
"""Decompress a file in assets module into a set of words, separated by newline"""
compressed_bytes = pkg_resources.read_binary(assets, asset_filename)
string = gzip.decompress(compressed_bytes).decode("ascii")
string_list = [word.strip().lower().strip() for word in string.split("\n")]
return set(string_list)
#+end_src
In order to avoid hardcoded filenames, we yank out the file names and fetching
of files:
#+CAPTION: Magic strings defined as module-level constants
#+NAME: choice-magicstrings
#+begin_src python :tangle no
ANSWERS_FILENAME = "wordle_answers_dict.txt.gz"
ACCEPTED_FILENAME = "wordle_accepted_words_dict.txt.gz"
#+end_src
#+CAPTION: Wrappers to the specific files to grab
#+NAME: choice-func-getdicts
#+begin_src python :tangle no
def get_answers() -> set[str]:
"""Grab the Wordle answers as a set of string words"""
return get_asset_zip_as_set(ANSWERS_FILENAME)
def get_accepted_words() -> set[str]:
"""Grab the Wordle accepted words dictionary as a set of string words"""
return get_asset_zip_as_set(ACCEPTED_FILENAME)
#+end_src
And now we can use the dictionary as a set in our =check_valid_word= function:
#+CAPTION: Use the dictionary as a set to check =if guess in dictionary=
#+NAME: valid-func-len-dict
#+begin_src python
"""Check a wordle guess is valid: length and in dictionary"""
answer_length = 5
guess_length = len(guess)
if guess_length < answer_length:
return False, "Guess too short"
if guess_length > answer_length:
return False, "Guess too long"
valid_words_dict = get_accepted_words()
if guess in valid_words_dict:
return True, None
return False, "Not a word from the dictionary"
#+end_src
Small performance note: Having a =set= of strings means =guess in answers_set= comparison is
=O(1)= (instead of =O(n)= on dictionary size for =list=), because the
hash-addressing of =set= is a =O(1)= operation. On very very long list of words,
iterating through it could be expensive, hence using =set= for lookup if we
don't need sequential access.
We changes the invocation of =pick_answer_word= to use the new functions too
#+NAME: choice-func-pickanswer-generic
#+CAPTION: Pick-a-word, revisited to use generic asset unzipping function. Note that =choice= needs an iterable object, hence convert back to list
#+begin_src python :tangle no
def pick_answer_word() -> str:
"""Pick a single word out of the dictionary of answers"""
return choice(list(get_answers()))
#+end_src
And we're done! Let's run our system through =make= again, to spot test failures
but also to get linters:
#+begin_src shell :tangle no :exports both :async
make
#+end_src
#+RESULTS:
#+begin_example
poetry install
Installing dependencies from lock file
No dependencies to install or update
Installing the current project: literate_wordle (0.1.0)
pre-commit run --all --all-files
Emacs export org-mode file to static HTML................................Passed
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check for added large files..............................................Passed
Check that executables have shebangs.................(no files to check)Skipped
Check for case conflicts.................................................Passed
Check vcs permalinks.....................................................Passed
Forbid new submodules....................................................Passed
Mixed line ending........................................................Passed
Check for merge conflicts................................................Passed
Detect Private Key.......................................................Passed
Check Toml...............................................................Passed
Check Yaml...............................................................Passed
Check JSON...........................................(no files to check)Skipped
black....................................................................Passed
isort (python)...........................................................Passed
mypy.....................................................................Passed
flake8...................................................................Passed
cd docs && make html
make[1]: Entering directory '/home/jiby/dev/ws/short/literate_wordle/docs'
Running Sphinx v4.4.0
Read in collections ...
wordle_html_export_filecopy: Initialised
gherkin_features_foldercopy: Initialised
gherkin_features_jinja: Initialised
Clean collections ...
gherkin_features_foldercopy: (CopyFolderDriver) Folder deleted: /home/jiby/dev/ws/short/literate_wordle/docs/source/_collections/gherkin_features/
gherkin_features_jinja: (JinjaDriver) Cleaning 1 jinja Based file/s ...
Executing collections ...
wordle_html_export_filecopy: (CopyFileDriver) Copy file...
gherkin_features_foldercopy: (CopyFolderDriver) Copy folder...
gherkin_features_jinja: (JinjaDriver) Creating 1 file/s from Jinja template...
loading pickled environment... done
[autosummary] generating autosummary for: _collections/gherkin_feature.md, index.rst, readme.md, wordle.md, wordle_sources.md
[AutoAPI] Reading files... [ 33%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/__init__.py
[AutoAPI] Reading files... [ 66%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/words.py
[AutoAPI] Reading files... [100%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/assets/__init__.py
[AutoAPI] Mapping Data... [ 33%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/__init__.py
[AutoAPI] Mapping Data... [ 66%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/words.py
[AutoAPI] Mapping Data... [100%] /home/jiby/dev/ws/short/literate_wordle/src/literate_wordle/assets/__init__.py
[AutoAPI] Rendering Data... [ 33%] literate_wordle
[AutoAPI] Rendering Data... [ 66%] literate_wordle.words
[AutoAPI] Rendering Data... [100%] literate_wordle.assets
myst v0.15.2: MdParserConfig(renderer='sphinx', commonmark_only=False, enable_extensions=['dollarmath'], dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', disable_syntax=[], url_schemes=['http', 'https', 'mailto', 'ftp'], heading_anchors=2, heading_slug_func=None, html_meta=[], footnote_transition=True, substitutions=[], sub_delimiters=['{', '}'], words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 5 source files that are out of date
updating environment: 0 added, 7 changed, 0 removed
reading sources... [ 14%] _collections/gherkin_feature
reading sources... [ 28%] autoapi/index
reading sources... [ 42%] autoapi/literate_wordle/assets/index
reading sources... [ 57%] autoapi/literate_wordle/index
reading sources... [ 71%] autoapi/literate_wordle/words/index
reading sources... [ 85%] wordle
reading sources... [100%] wordle_sources
Copying static files for sphinx-needs datatables support.../home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/datatables_loader.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/datatables.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/sphinx_needs_collapse.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/datatables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/JSZip-2.5.0/jszip.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/buttons.print.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/buttons.flash.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/buttons.html5.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/buttons.colVis.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/dataTables.buttons.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/js/buttons.html5.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/css/common.scss /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/css/mixins.scss /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/css/buttons.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Buttons-1.5.1/swf/flashExport.swf /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/js/jquery.dataTables.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/css/jquery.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/images/sort_asc.png /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/images/sort_desc_disabled.png /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/images/sort_asc_disabled.png /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/images/sort_both.png /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/DataTables-1.10.16/images/sort_desc.png /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/ColReorder-1.4.1/js/dataTables.colReorder.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/ColReorder-1.4.1/css/colReorder.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/FixedColumns-3.2.4/js/dataTables.fixedColumns.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/FixedColumns-3.2.4/css/fixedColumns.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Scroller-1.4.4/js/dataTables.scroller.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Scroller-1.4.4/css/scroller.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/FixedHeader-3.1.3/js/dataTables.fixedHeader.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/FixedHeader-3.1.3/css/fixedHeader.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Responsive-2.2.1/js/dataTables.responsive.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/Responsive-2.2.1/css/responsive.dataTables.min.css /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/pdfmake-0.1.32/pdfmake.min.js /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/libs/html/pdfmake-0.1.32/vfs_fonts.js
Copying static files for sphinx-needs custom style support...[ 25%] common.css
Copying static files for sphinx-needs custom style support...[ 50%] /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/css/modern/layouts.css
Copying static files for sphinx-needs custom style support...[ 75%] /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/css/modern/styles.css
Copying static files for sphinx-needs custom style support...[100%] /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/css/modern/modern.css
looking for now-outdated files... none found
pickling environment... done
checking consistency... /home/jiby/dev/ws/short/literate_wordle/docs/source/autoapi/index.rst: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [ 12%] _collections/gherkin_feature
writing output... [ 25%] autoapi/index
writing output... [ 37%] autoapi/literate_wordle/assets/index
writing output... [ 50%] autoapi/literate_wordle/index
writing output... [ 62%] autoapi/literate_wordle/words/index
writing output... [ 75%] index
writing output... [ 87%] wordle
writing output... [100%] wordle_sources
/home/jiby/dev/ws/short/literate_wordle/docs/source/_collections/gherkin_feature.md:34: WARNING: Any IDs not assigned for table node
generating indices... genindex py-modindex done
highlighting module code... [ 50%] literate_wordle
highlighting module code... [100%] literate_wordle.words
writing additional pages... search done
copying images... [ 50%] /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/images/feather_svg/arrow-down-circle.svg
copying images... [100%] /home/jiby/dev/ws/short/literate_wordle/.venv/lib/python3.9/site-packages/sphinxcontrib/needs/images/feather_svg/arrow-right-circle.svg
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 2 warnings.
The HTML pages are in build/html.
Final clean of collections ...
wordle_html_export_filecopy: (CopyFileDriver) File deleted: /home/jiby/dev/ws/short/literate_wordle/docs/source/_collections/_static/wordle.html
gherkin_features_foldercopy: (CopyFolderDriver) Folder deleted: /home/jiby/dev/ws/short/literate_wordle/docs/source/_collections/gherkin_features/
gherkin_features_jinja: (JinjaDriver) Cleaning 1 jinja Based file/s ...
gherkin_features_jinja: (JinjaDriver) File deleted: /home/jiby/dev/ws/short/literate_wordle/docs/source/_collections/gherkin_feature.md
Checking sphinx-needs warnings
make[1]: Leaving directory '/home/jiby/dev/ws/short/literate_wordle/docs'
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collecting ... collected 5 items
tests/test_checking_guess_valid_word.py::test_reject_long_words PASSED [ 20%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words PASSED [ 40%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words PASSED [ 60%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words PASSED [ 80%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [100%]
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/words.py 23 0 100%
------------------------------------------------------------
TOTAL 24 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
============================== 5 passed in 0.09s ===============================
poetry build
Building literate_wordle (0.1.0)
- Building sdist
- Built literate_wordle-0.1.0.tar.gz
- Building wheel
- Built literate_wordle-0.1.0-py3-none-any.whl
#+end_example
Tests pass, coverage stays strong, and linters are quiet, this is great!
** Performance trick
We mentioned before that the whole dictionary would get unzipped on every
request for assets. Now we're validating guessed words, we may want to be
processing guesses quite often, certainly quicker than one would pick secret
words!
What we want to make all this fast, is to cache the unzipped dictionary,
so that repeated calls to the function =get_asset_zip_as_set= don't bother with
file open and unzip, just serve the few hundred kilobytes content from memory again. There's a handy python decorator that does the trick! Let's
add =functools.cache= on top of our slow function:
#+NAME: valid-cache-import
#+CAPTION: Import the cache function
#+begin_src python :tangle no
from functools import cache
#+end_src
#+NAME: valid-cache-decorator
#+CAPTION: Decorator to make a function use cache
#+begin_src python :tangle no
@cache
#+end_src
After rerunning our tests, we now have a (theoretically) faster function, yey!
Remember that we committed a couple of performance/optimization sins just then,
by both: optimizing prematurely (with no proof of slowness), and by doing
optimization without using profiling information to optimize, we very likely
just optimized something that isn't our bottleneck. I'm fine with that, I just
wanted to showcase this cool decorator, which functions like an unbounded
memoizer. Let's see quick performance numbers of before/after:
#+NAME: valid-perf-before
#+CAPTION: Before cacheing, running 5 batches of a thousand double-dict-unzip
#+begin_src shell :exports both
poetry run python3 -m timeit -v -n 1000 --setup "from literate_wordle.words import pick_answer_word, check_valid_word" "check_valid_word(pick_answer_word())"
#+end_src
#+RESULTS:
: raw times: 2.75 sec, 2.72 sec, 2.73 sec, 2.73 sec, 2.72 sec
:
: 1000 loops, best of 5: 2.72 msec per loop
And after cacheing:
#+begin_src shell :exports results
poetry run python3 -m timeit -v -n 1000 --setup "from literate_wordle.words import pick_answer_word, check_valid_word" "check_valid_word(pick_answer_word())"
#+end_src
#+RESULTS:
: raw times: 17.1 msec, 12.8 msec, 12.6 msec, 12.8 msec, 12.4 msec
:
: 1000 loops, best of 5: 12.4 usec per loop
That's a two orders of magnitude gain for a single line of code changed. Sweet.
** Bug!
Doing some exploration of the accepted/answer word sets, I noticed an issue:
#+CAPTION: Count word size in each dictionary
#+begin_src python :tangle no :exports both :eval no-export
from literate_wordle.words import get_answers, get_accepted_words
answer_lengths = [len(word) for word in list(get_answers())]
accepted_lengths = [len(word) for word in list(get_accepted_words())]
print(set(answer_lengths))
print(set(accepted_lengths))
#+end_src
#+RESULTS:
: {0, 5}
: {0, 5}
Each have a 0-length words, in other word, the empty string.
This is likely a classic issue due to DOS line endings, the last line of the
file is only a carriage return, which is technically whitespace, and the call to
=strip()= removes it, leaving an empty space item in the list.
If this was a proper production issue we just discovered, we would first turn
the above snippet into a proper test case (asserting no 0 length word exist,
seeing it be red), commit that, raise it as bug, and work on a fix. But this
code hasn't reached production yet, and the bug itself is minor enough to not
warrant that during our exploration phase.
We can fix this multiple ways. We /could/ make the =get_accepted_words= and
=get_answers= functions change their behaviours (either via set operations to
remove the empty item from the set, returning =set(words) - set([""])=, or more
likely removing empty entries during iteration), but that wouldn't prevent
future users of the buggy function =get_asset_zip_as_set= to get the same issue.
So let's fix it at the root, the =get_asset_zip_as_set= function:
#+NAME: choice-func-unzipdict-generic2
#+CAPTION: Generic "unzip asset" function, filtering the whitespace-only words
#+begin_src python :tangle no
def get_asset_zip_as_set(asset_filename: str) -> set[str]:
"""Decompress a file in assets module into a set of words, separated by newline"""
compressed_bytes = pkg_resources.read_binary(assets, asset_filename)
string = gzip.decompress(compressed_bytes).decode("ascii")
string_list = [word.strip().lower().strip() for word in string.split("\n")]
# Protect against whitespace-only lines during file-read causing empty stripped word
non_empty_words = [word for word in string_list if len(word) != 0]
return set(non_empty_words)
#+end_src
This was a good opportunity to play with List Comprehensions with filters, yey.
** Tangle out all the code
The last section of each heading of this document is used for internal purposes.
The Code snippets defined above are usually out of order, especially the
imports, or functions defined once as stubs, then re-defined with proper
implementation.
To avoid having nonsense python file ordering, with import-feature-import-feature
sequences, which formatters would go crazy over, we define below the reordered
code blocks as they should be output, using the =noweb= feature of org-mode.
This lets us reference code blocks above by name, tangle out into the
proper files with proper ordering and spacing as one would expect a real
codebase to look like.
This means we need to manually weave the code blocks, instead of pointing them
all to the same file and rely on code snippet's top-to-bottom order, we now have
an explicit code block where we template out "add this bit, now 2 lines below
add that snippet, and then...". This isn't super pretty, but it gives complete
control over layout like number of lines jumped between functions, which was
blocking adoption of the formatter "black" in this repository.
First, fix =words.py= imports being out of order in our narrative by tangling
them via noweb to weave the part 1 imports with the part 2. This means =isort=
(import sorter[fn::Sorts import-code first by category, then alphabetically.
Category of imports is in decreasing order: stdlib, then third party packages,
then local module imports.]) is now happy and won't thrash these python files.
Also insert the cache decorator before the assets function, and substitute the
=check_valid_word= function body with the real implementation instead of the
dummy function defined initially.
#+CAPTION: The final version of =words.py=. Note that behind the scene this code block has 0 code, just references to other code block named above.
#+NAME: words-py-tangle
#+begin_src python :tangle src/literate_wordle/words.py :noweb yes
<<choice-module-docstring>>
<<choice-stdlib>>
<<valid-cache-import>>
<<choice-stdlib2>>
<<valid-stdlib>>
<<choice-locallib>>
<<choice-magicstrings>>
<<choice-func-getdicts>>
<<valid-cache-decorator>>
<<choice-func-unzipdict-generic2>>
<<choice-func-pickanswer-generic>>
<<valid-func-proto>>
<<valid-func-len-dict>>
#+end_src
Now the same thing with the tests file, which indeed /is/ in proper order
already, but would benefit from two-lines-between-tests to guarantee formatting:
#+CAPTION: Final version of =tests/test_checking_guess_valid_word.py=
#+NAME: words-py-tangle
#+begin_src python :tangle tests/test_checking_guess_valid_word.py :noweb yes
<<test-valid-import>>
<<test-valid-1>>
<<test-valid-2>>
<<test-valid-3>>
<<test-valid-4>>
<<reject-reason-none>>
#+end_src
* Calculating guessed word's score
We can pick answer words, and we can check if a guess is a valid word, now we
have everything we need to score the guess! Let's first define the overall
feature:
#+NAME: scoring-feature
#+begin_src feature :tangle features/scoring_guess.feature
Feature: Scoring guesses
As a Wordle game
I need to tell the player how good their guess is
In order to help them find the proper answer
#+end_src
This sounds simple, but implementing this feature is tricky, because of edge
cases like multiple identical character in the answer, which need colored
appropriately (What's the proper way to do that? No clue yet, but we need to pin
it down in requirements!). So again we'll define Gherkin Scenarios for that
Feature, to give examples of how the feature works in practice. So we write out:
#+NAME: scoring-scenario-perfect
#+CAPTION: The winning guess scenario. The green blocks are Unicode characters, may render differently on your device.
#+begin_src feature :tangle no
Scenario: Perfect guess gives perfect score
Given a wordle answer "crane"
When scoring the guess "crane"
Then score should be "🟩🟩🟩🟩🟩"
#+end_src
This seems easy enough, but we should notice that we're assuming the
guess is a valid word! We may want to just add another =Given=, like:
#+begin_src feature :tangle no
Given a guess that's a valid dictionary word
#+end_src
But this isn't just a hypothesis from the current scenario, it's valid for all
scenarios of this feature: every scoring of a guess requires the guess to be a
valid word. To avoid the tedious copying of that assumption in each Scenario, we
can use a Gherkin Background for the feature:
#+NAME: scoring-background
#+CAPTION: Pre-condition that applies to all the scenarios of this feature file
#+begin_src feature :tangle features/scoring_guess.feature
Background:
Given a guess that's a valid dictionary word
#+end_src
Perfect, so now we're assuming the guess is a valid word, which means a
dependency on having implemented the previous feature, but we're not specifying
the guess word itself, which can still be scenario specific. This makes our
initial "perfect guess" scenario valid again, so we can use it
# Not-rendered, but since we're trying to avoid being out of order = having a
# final weave block at the end, we tangle the scenario for perfect score NOW
# after avoiding it due to the BACKGROUND step.
#+begin_src feature :tangle features/scoring_guess.feature :noweb yes :exports none
<<scoring-scenario-perfect>>
#+end_src
If we've got the perfect answer, let's have the opposite:
#+NAME: scoring-scenario-nogood
#+CAPTION: Flunking out scenario
#+begin_src feature :tangle features/scoring_guess.feature
Scenario: No character in common
Given a wordle answer "brave"
When scoring the guess "skill"
Then score should be "⬜⬜⬜⬜⬜"
#+end_src
Note that these scenarios doesn't make assumption of how many attempts at Wordle
we're at, or the fact of winning or losing. This is purely a hypothetical
example, disjoint from the actual playing of a Wordle game. We can deal with the
win/lose consequences later, once we have a proper scoring of guesses implemented.
** Can we start coding yet?
At this point, we /can/ conceivably start the implementation work: "let's go, we
have work to do!" And we can add the "🟨" scenario later once we have code that
works.
The problem of "what to do now" is interesting, because we can continue thinking
up scenarios in Gherkin for a while, or we could make a start writing test code
to match these claims, fix the red tests, implement towards green tests, and add
scenarios as we realize that our implementation is lacking compared to the
original intent of the game. That can certainly be done!
But while it's tempting to jump into code first, I strongly believe we as
developers should instead fully scope out the problem-space first. Pin down the
exact requirements (in that case via Gherkin features and scenarios), before
starting to touch any code. My reasoning is that [[https://jiby.tech/post/gherkin-features-user-requirements/][it's very easy to get tunnel
vision when writing code, getting excited about the programming problems, losing
track of what the "user" wants. We should instead write down the exact user
needs first]], and have a proper "ritual" for switching our "User" hat to a
"Developer" hat.
** Finalizing the scoring scenarios
So, back to our gherkin scenarios, let's add the yellow marker one:
#+NAME: scoring-scenario-wrongplace
#+CAPTION: Character in the wrong place score
#+begin_src feature :tangle features/scoring_guess.feature
Scenario: Character in wrong place
Given a wordle answer "rebus"
When scoring the guess "skull"
Then score should be "🟨⬜🟨⬜⬜"
#+end_src
And just for having a good sample of tests with which to test, let's use a table
of examples to confirm scoring works out in more cases:
#+NAME: scoring-scenario-multi
#+CAPTION: Many examples via Gherkin Scenario Outlines and Examples
#+begin_src feature :tangle features/scoring_guess.feature
Scenario Outline: Scoring guesses
Given a wordle <answer>
When scoring <guess>
Then score should be <score>
# Emoji (Unicode) character rendering is hard:
# Please forgive the table column alignment issues!
Examples: A few guesses and their score
| answer | guess | score |
| adage | adobe | 🟩🟩⬜⬜🟩 |
| serif | quiet | ⬜⬜🟨🟨⬜ |
| raise | radix | 🟩🟩⬜🟨⬜ |
#+end_src
Note how the "outline" system maps really well to the idea of "parametrized
tests". We can write the test case /once/, and have a decorator deal with the
multiple instantiations with different data.
All right, that's a few, moving on. But here is the most difficult to implement
corner case, written out as examples of the previous scenario:
#+NAME: scoring-scenario-multi-identicalanswerchar
#+CAPTION: Edge case: duplicate character in answer or guess
#+begin_src feature :tangle features/scoring_guess.feature
Examples: Multiple occurences of same character
| answer | guess | score |
| abbey | kebab | ⬜🟨🟩🟨🟨 |
| abbey | babes | 🟨🟨🟩🟩⬜ |
| abbey | abyss | 🟩🟩🟨⬜⬜ |
| abbey | algae | 🟩⬜⬜⬜🟨 |
| abbey | keeps | ⬜🟨⬜⬜⬜ |
| abbey | abate | 🟩🟩⬜⬜🟨 |
#+end_src
Because this edge case was worrisome for accuracy, these sample answers and
scores were [[https://nerdschalk.com/wordle-same-letter-twice-rules-explained-how-does-it-work/][taken from online example screenshots]] of the original Wordle
website, thus considered accurate references.
Thinking about it, with "abbey" as reference, the "kebab" answer seems logical,
with first "b" occurence matching as green, and the second being in the wrong
place. The surprise comes from "keeps" where the first "e" counts, but the
second doesn't have an equivalent in the answer, hence flagged as "no such
character". That makes sense, but that's not how a naive implementation of the
game would do it! Hence why it's worth thinking about the full problem before
rushing the implementation.
# Seems to be that we need to count the answer's occurences of each character,
# and while scoring guesses left-to-right, yellows and greens decrease the
# number of leftover matches, and when the number of matches is zero that's a non-match.
# That explains why guessing "kebab" for answer gets the first "b" marked green,
# as expected, and then the second is yellow (still remains values), whereas guessing
# "keeps" (one "e" in answer, two in guess) marks the first "e" as yellow, and
# with no more "e" in answer, the second is a bad match.
Out of curiosity, I wonder if there's any wordle answers that contain three
identical characters? Let's see!
#+CAPTION: Regular-expression search for 3 repeated characters in the dictionary of answers
#+begin_src shell :exports both
zgrep -i -E "([a-z]).*\1.*\1" \
src/literate_wordle/assets/wordle_answers_dict.txt.gz \
| wc -l
#+end_src
#+RESULTS:
: 20
Really? 20? That's harsh ... show me one?
#+CAPTION: Reval the first answer with 3 identical letters
#+begin_src shell :exports both
zgrep -i -E "([a-z]).*\1.*\1" \
src/literate_wordle/assets/wordle_answers_dict.txt.gz \
| head -n 1 \
| sed 's/\r//' # gets rid of CR characters in CRLF (DOS line endings)
#+end_src
#+RESULTS:
: bobby
Interesting. That must be hard to solve I imagine.
** Writing up acceptance tests
With no more obvious pathological cases to cover in requirements, it's time to
switch to our developer hat, and write some (acceptance) tests!
#+NAME: scoring-test1
#+CAPTION: First acceptance test using "Perfect guess" scenario
#+begin_src python :tangle no
def test_perfect_guess():
"""Scenario: Perfect guess gives perfect score"""
# Given a wordle answer "crane"
answer = "crane"
# When scoring the guess "crane"
our_guess = "crane"
score = score_guess(our_guess, answer)
# Then score should be "🟩🟩🟩🟩🟩"
assert score == "🟩🟩🟩🟩🟩", "Perfect answer should give Perfect Score"
#+end_src
A =score_guess= function? sounds reasonable. We'll need to import it from a module...
#+NAME: scoring-test-import
#+CAPTION: Importing the newly thought-up function inside the test of Listing [[scoring-test1]]
#+begin_src python :tangle no
from literate_wordle.guess import score_guess
#+end_src
This means we now need to create such a module.
#+NAME: scoring-guessmod-header
#+CAPTION: New =guess.py= module, starting with docstring
#+begin_src python :tangle no
"""Score guesses of Wordle game"""
#+end_src
We already defined most of the function (name, module, output), so let's just
write a stub that will make tests go red.
#+NAME: scoring-guessfunc-proto1
#+CAPTION: =score_guess= stub to see the tests go red
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess"""
return "⬜"
#+end_src
Now the test should fail appropriately, let's add a twist: we'll mark the test
function as expected to fail, because for now it's not been implemented. This
allows the test runner to mark all tests as OK despite known failures, and
is perfect for known bugs being worked on, or new features being built.
Imagine if every time we built new features via TDD, the commit that adds the
test first makes CI go red! No, we would rather have a nice "excuse" for this
new test to fail, and have the build stay green, "with an expected failure".
#+NAME: scoring-test-xfail
#+CAPTION: Decorator marking a test as expected to fail, "excusing" assertion failures
#+begin_src python :tangle no
@pytest.mark.xfail(reason="Not implemented yet")
#+end_src
In the case of a known bug, the =reason= field would very likely be a bug
identifier in the organisation's bug tracker.
#+NAME: scoring-test-import-pytest
#+CAPTION: Importing the pytest module to get the =pytest.mark.xfail= decorator
#+begin_src python :tangle no
import pytest
#+end_src
Confirm these tests work, marked as xfail ("eXpected FAILure"):
#+begin_src shell :exports both
make test
#+end_src
#+RESULTS:
#+begin_example
poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collecting ... collected 6 items
tests/test_checking_guess_valid_word.py::test_reject_long_words PASSED [ 16%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words PASSED [ 33%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words PASSED [ 50%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words PASSED [ 66%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [ 83%]
tests/test_scoring_guess.py::test_perfect_guess XFAIL (Not implement...) [100%]
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/guess.py 2 0 100%
src/literate_wordle/words.py 25 0 100%
------------------------------------------------------------
TOTAL 28 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
========================= 5 passed, 1 xfailed in 0.10s =========================
#+end_example
Note that we now have regular tests that pass, and this one test that fails as expected, and
=pytest=, expecting it, doesn't shout about the failure. Really handy.
Remember that "disabling" (marking as =pytest.mark.skip=) is different from
marking as =xfail=, because skipping a test avoids running it, while =xfail=
tests do run, the assertion failure is just not marked as critical. There's even
a flag to make =xpass= (expected test failures that ended up being green) become
an actual fatal testing error, for the cases where it's important to track the
failure itself.
** More tests
Let's implement the rest of the failing tests, so we can make it all red, then
fix the implementation:
#+NAME: scoring-test2
#+CAPTION: Second acceptance test using "no character in common" scenario
#+begin_src python :tangle no
def test_no_common_character():
"""Scenario: No character in common"""
# Given a wordle answer "brave"
answer = "brave"
# When scoring the guess "skill"
our_guess = "skill"
score = score_guess(our_guess, answer)
# Then score should be "⬜⬜⬜⬜⬜"
assert score == "⬜⬜⬜⬜⬜", "No character in common with answer should give 0 score"
#+end_src
#+NAME: scoring-test3
#+CAPTION: Third acceptance test using "Characters in wrong place" scenario
#+begin_src python :tangle no
def test_wrong_place():
"""Scenario: Character in wrong place"""
# Given a wordle answer "rebus"
answer = "rebus"
# When scoring the guess "skull"
our_guess = "skull"
score = score_guess(our_guess, answer)
# Then score should be "🟨⬜🟨⬜⬜"
assert score == "🟨⬜🟨⬜⬜", "Characters are in the wrong place"
#+end_src
That covers the first three scenarios.
For the Scenario Outline, it's interesting to notice that a pattern emerged,
which allows the same test skeleton to be reused with different data. In Pytest,
this can be done by "parametrizing" the test with multiple data entries.
This is a decorator to flag data, but since
we're trying to group some of those tests as part of different groups, we will
use the =pytest.param.id= flag.
#+NAME: scoring-multi-skeleton
#+CAPTION: Generic acceptance test, without any data attached
#+begin_src python :tangle no
def test_generic_score(answer, our_guess, expected_score):
"""Scenario Outline: Scoring guesses"""
# Given a wordle <answer>
# When scoring <guess>
score = score_guess(our_guess, answer)
# Then score should be <score>
assert score == expected_score
#+end_src
Just need to fill in the parameters:
#+NAME: scoring-multi-parameters
#+CAPTION: Parameters for generic test. Notice how =id= is used to cluster test data source, making =multi_occur= tests look separate to =normal_guess= ones.
#+begin_src python :tangle no
@pytest.mark.parametrize(
"answer,our_guess,expected_score",
[
pytest.param("adage", "adobe", "🟩🟩⬜⬜🟩", id="normal_guess1"),
pytest.param("serif", "quiet", "⬜⬜🟨🟨⬜", id="normal_guess2"),
pytest.param("raise", "radix", "🟩🟩⬜🟨⬜", id="normal_guess3"),
pytest.param("abbey", "kebab", "⬜🟨🟩🟨🟨", id="multi_occur1"),
pytest.param("abbey", "babes", "🟨🟨🟩🟩⬜", id="multi_occur2"),
pytest.param("abbey", "abyss", "🟩🟩🟨⬜⬜", id="multi_occur3"),
pytest.param("abbey", "algae", "🟩⬜⬜⬜🟨", id="multi_occur4"),
pytest.param("abbey", "keeps", "⬜🟨⬜⬜⬜", id="multi_occur5"),
pytest.param("abbey", "abate", "🟩🟩⬜⬜🟨", id="multi_occur6"),
],
)
#+end_src
** Implementing the feature
With the strong test harness we have, this scoring function can be done
conveniently.
Let's experiment with the solution, iterating over naive solution and seeing how
close they get to implementing the feature, by number of tests failed. This
isn't required, we have already identified edge cases that make naive solutions
break, but this is the fun experimenting part.
Before any actual code change, first we remove the "xfail" marker, so that test
failures actually notify us as failures, as we're actually implementing things now.
#+CAPTION: A simple string matching by iterating over both lists at once
#+NAME: scoring-guessfunc-naive
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess naively"""
NO = "⬜"
OK = "🟩"
response = ""
for answer_char, guess_char in zip(answer, guess):
if answer_char == guess_char:
response += OK
else:
response += NO
return response
#+end_src
That only passes 3 tests of the 12 we just defined, obviously because we don't
deal with incorrect characters at all. So let's add keeping track of characters
in the wrong places:
#+NAME: scoring-guessfunc-naive2
#+CAPTION: Keep track of all answer characters while iterating through both list
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess a little less naively"""
NO = "⬜"
OK = "🟩"
WRONG_PLACE = "🟨"
answer_chars_set = set(list(answer))
response = ""
for answer_char, guess_char in zip(answer, guess):
if answer_char == guess_char:
response += OK
elif guess_char in answer_chars_set:
response += WRONG_PLACE
else:
response += NO
return response
#+end_src
That version now passes 8 of 12 tests, with the issue being the multiple
occurence of the same character in the answer being treated wrong, clearly an
edge case we were fortunate to identify early.
Looking at the examples, it seems that our scoring function needs to keep track of how
many occurences of each characters of the answer exists overall, and grade only the
first occurence of such characters as "wrong place", reducing the counter.
Fortunately, Python implements a good Counter function which we can import:
#+NAME: scoring-guessfunc-import
#+CAPTION: Import the =Counter= class, which generates a dictionary of =item= to =count= on whatever it's given
#+begin_src python :tangle no
from collections import Counter
#+end_src
We want something like this:
#+begin_src python :tangle no
if guess_char in answer_chars and answer_chars[guess_char] > 0:
response += WRONG_PLACE
# Reduce occurence since we "used" this one
answer_chars[guess_char] -= 1
# No more hits = pretend character isn't even seen (remove from dict)
if answer_chars[guess_char] == 0:
del answer_chars[guess_char]
#+end_src
So we try the Counter way
#+NAME: scoring-guessfunc-impl1
#+CAPTION: Use a =Counter= for character multiple occurences
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess with Counter"""
NO = "⬜"
OK = "🟩"
WRONG_PLACE = "🟨"
# Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
answer_chars = Counter(answer)
response = ""
for answer_char, guess_char in zip(answer, guess):
if answer_char == guess_char:
response += OK
elif guess_char in answer_chars and answer_chars[guess_char] > 0:
response += WRONG_PLACE
# Reduce occurence since we "used" this one
answer_chars[guess_char] -= 1
# No more hits = pretend character isn't even seen (remove from dict)
if answer_chars[guess_char] == 0:
del answer_chars[guess_char]
else:
response += NO
return response
#+end_src
But while this improves the score, we are still 3 tests from success! Turns out
we only did the reduction of counter for yellow, not also greens. This needs a
bit of reshuffling:
#+NAME: scoring-guessfunc-impl
#+CAPTION: Use a Counter, keeping track of both Green and Yellow
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess with Counter"""
NO = "⬜"
OK = "🟩"
WRONG_PLACE = "🟨"
# Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
answer_chars = Counter(answer)
response = ""
for guess_char, answer_char in zip(guess, answer):
if guess_char not in answer_chars:
response += NO
continue # Early exit for this character, skip to next
# From here on, we MUST have a char in common, regardless of place
if answer_char == guess_char:
response += OK
elif answer_chars[guess_char] > 0:
response += WRONG_PLACE
# Either way, reduce occurence counter since we "used" this occurence
answer_chars[guess_char] -= 1
# No more hits = pretend character isn't even seen (remove from dict)
if answer_chars[guess_char] == 0:
del answer_chars[guess_char]
return response
#+end_src
Now that we're happy with this, we can refactor out the ugly hardcoded glyphs:
#+CAPTION: No more hardcoded glyphs
#+NAME: scoring-guess-enum
#+begin_src python :tangle no
class CharacterScore(str, Enum):
"""A single character's score"""
OK = "🟩"
NO = "⬜"
WRONG_PLACE = "🟨"
#+end_src
#+CAPTION: Imports for the Enum
#+NAME: scoring-guess-enum-import
#+begin_src python :tangle no
from enum import Enum
#+end_src
And to use it as part of our scoring function:
#+NAME: scoring-guessfunc-impl2
#+CAPTION: Refactored function to avoid magic glyphs
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess with Counter"""
# Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
answer_chars = Counter(answer)
response = ""
for guess_char, answer_char in zip(guess, answer):
if guess_char not in answer_chars:
response += CharacterScore.NO
continue # Early exit for this character, skip to next
# From here on, we MUST have a char in common, regardless of place
if answer_char == guess_char:
response += CharacterScore.OK
elif answer_chars[guess_char] > 0:
response += CharacterScore.WRONG_PLACE
# Either way, reduce occurence counter since we "used" this occurence
answer_chars[guess_char] -= 1
# No more hits = pretend character isn't even seen (remove from dict)
if answer_chars[guess_char] == 0:
del answer_chars[guess_char]
return response
#+end_src
** Tangle it all out
As before, we reorder the blocks from snippets above to export code in a way
that keeps proper formatting.
#+NAME: scoring-impl-tangleweb
#+CAPTION: Final =guess.py=
#+begin_src python :tangle no :noweb yes
<<scoring-guessmod-header>>
<<scoring-guessfunc-import>>
<<scoring-guess-enum-import>>
<<scoring-guess-enum>>
<<scoring-guessfunc-impl2>>
#+end_src
#+NAME: scoring-test1-tangleweb
#+CAPTION: Final =tests/test_scoring_guess.py=
#+begin_src python :tangle no :noweb yes
"""Validates the Gherkin file features/scoring_guess.feature:
<<scoring-feature>>
"""
<<scoring-test-import-pytest>>
<<scoring-test-import>>
<<scoring-test1>>
<<scoring-test2>>
<<scoring-test3>>
<<scoring-multi-parameters>>
<<scoring-multi-skeleton>>
#+end_src
* Playing a round of Wordle
With all the subfeatures we have, we can now play a round of wordle, we're
missing only the "state" of the game board, with the interactivity of the game.
#+NAME: track-guess-feat
#+begin_src feature :tangle no
Feature: Track number of guesses
As a Wordle game
I need to track how many guesses were already given
In order to announce win or game over
#+end_src
There are a few obvious cases we want to see:
#+CAPTION: Allow the first guess
#+NAME: track-guess-scenario1
#+begin_src feature :tangle no
Scenario: First guess is allowed
Given a wordle answer
And I didn't guess before
When I guess the word
Then my guess is scored
#+end_src
#+CAPTION: Still allow the sixth guess (skipping 2 through 5)
#+NAME: track-guess-scenario2
#+begin_src feature :tangle no
Scenario: Sixth guess still allowed
Given a wordle answer
And I guessed 5 times
When I guess the word
Then my guess is scored
#+end_src
#+CAPTION: Sixth guess is last guess (fail on seventh)
#+NAME: track-guess-scenario3
#+begin_src feature :tangle no
Scenario: Six failed guess is game over
Given a wordle answer
And I guessed 6 times already
When I guess the word
And my guess isn't the answer
Then my guess is scored
But game shows "Game Over"
And game shows the real answer
#+end_src
This feature shows us all the state we need to manage to track a Wordle game:
- an answer
- the number of previous guesses
- the previous guesses themselves? not needed after we print them
- the previous guesses' scores? not needed after we print it either
So a Wordle Game is the aggregate of "answer" + "number of guesses", nothing
else.
Let's write the test:
#+CAPTION: New =tests/test_track_guess_number.py= file with just feature's docstring
#+NAME: track-guess-test-docs
#+begin_src python :tangle no :noweb yes
"""Validates the Gherkin file features/track_guesses.feature
<<track-guess-feat2>>
"""
#+end_src
#+CAPTION: Test to track first guess
#+NAME: track-guess-test1
#+begin_src python :tangle no
def test_first_guess_allowed():
"""Scenario: First guess is allowed"""
# Given a wordle answer
answer = "orbit"
# And I didn't guess before
guess_number = 0
game = WordleGame(answer=answer, guess_number=guess_number)
# When I guess the word
guess = "kebab"
result = play_round(guess, game)
# Then my guess is scored
OUTCOME_CONTINUE = WordleMoveOutcome.GUESS_SCORED_CONTINUE
assert result.outcome == OUTCOME_CONTINUE, "Game shouldn't be over yet"
assert result.score is not None, "No score given as result"
assert len(result.score) == 5, "Score of incorrect length"
ALLOWED_CHARS = [score.value for score in Score]
assert all(
char in ALLOWED_CHARS for char in list(result.score)
), "Score doesn't match score's characters"
#+end_src
In the test above, I've done quite a bit of world-building:
- Used a new =WordleGame= structure keeping game state
- Used a new =WordleMoveOutcome= enumeration to describe outcomes
- Used a new =play_round= function that takes a game + guess
- Implied in =result= variable at a structure for new Game state after a move
#+CAPTION: The other imports for the test of Listing [[track-guess-test1]]
#+NAME: track-guess-test-import
#+begin_src python :tangle no :noweb yes
from literate_wordle.game import WordleGame, WordleMoveOutcome, play_round
from literate_wordle.guess import CharacterScore as Score
#+end_src
This practice of calling an API that doesn't exist yet is the coolest part of
TDD, because the tests lend their power to help design what the software should
feel like, even if we have no idea how to create the backend to that API yet.
The focus on how the feature is /used/ changes from the usual engineering
mindset of how we envision the backend, very valuable.
All right, so with that in mind, let's start actually building these data
structures.
#+CAPTION: Enum for outcomes of a single move
#+NAME: track-guess-gamestate1
#+begin_src python :tangle no
class WordleMoveOutcome(Enum):
"""Outcome of a single move"""
GAME_OVER_LOST = 1
GAME_WON = 2
GUESS_SCORED_CONTINUE = 3
#+end_src
#+CAPTION: Objects necessary to keep state of the game
#+NAME: track-guess-gamestate2
#+begin_src python :tangle no
@dataclass
class WordleGame:
"""A Wordle game's internal state, before a move is played"""
answer: str
guess_number: int
@dataclass
class WordleMove:
"""A Wordle game state once a move is played"""
game: WordleGame
outcome: WordleMoveOutcome
message: str
score: Optional[str]
#+end_src
#+CAPTION: Imports for enumeration of state and data-holding classes of Listing [[track-guess-gamestate1]],[[track-guess-gamestate2]]
#+NAME: track-guess-import-dataclass
#+begin_src python :tangle no
from dataclasses import dataclass
from enum import Enum
from typing import Optional
#+end_src
With the datastructures ready, we can define our function's signature:
#+NAME: track-guess-proto
#+begin_src python :tangle no
def play_round(guess: str, game: WordleGame) -> WordleMove:
"""Use guess on the given game, resulting in WordleMove"""
#+end_src
Before we finish implementing this function, let's define the rest of the
acceptance tests we settled on in Gherkin:
#+CAPTION: Second test, for sixth guess still OK
#+NAME: track-guess-test2
#+begin_src python :tangle no
def test_sixth_guess_allowed():
"""Scenario: Fifth guess still allowed"""
# Given a wordle answer
answer = "orbit"
# And I guessed 5 times
guess_number = 6
game = WordleGame(answer=answer, guess_number=guess_number)
# When I guess the word
guess = "kebab"
result = play_round(guess, game)
# Then my guess is scored
OUTCOME_CONTINUE = WordleMoveOutcome.GUESS_SCORED_CONTINUE
assert result.outcome == OUTCOME_CONTINUE, "Game shouldn't be over yet"
assert result.score is not None, "No score given as result"
assert len(result.score) == 5, "Score of incorrect length"
OK_CHARS = ["🟩", "🟨", "⬜"]
assert all(
char in OK_CHARS for char in list(result.score)
), "Score doesn't match score's characters"
#+end_src
#+CAPTION: Actually denying a seventh guess
#+NAME: track-guess-test3
#+begin_src python :tangle no
def test_seventh_guess_fails_game():
"""Scenario: Sixth failed guess is game over"""
# Given a wordle answer
answer = "orbit"
# And I guessed 6 times already
# Guessing 6 times BEFORE, using seventh now:
guess_number = 7
game = WordleGame(answer, guess_number)
# When I guess the word
# And my guess isn't the answer
guess = "kebab"
result = play_round(guess, game)
# Then my guess isn't scored
assert result.outcome == WordleMoveOutcome.GAME_OVER_LOST, "Should have lost game"
# But game shows "Game Over"
assert "game over" in result.message.lower(), "Should show game over message"
# And game shows the real answer
assert answer in result.message
#+end_src
As I write the test in Listing [[track-guess-test3]], I notice there's one case of
the =enum= we haven't covered(=WordleMoveOutcome.GAME_WON=), which means the
=play_round= scenarios aren't correct yet. Let's add the scenario for winning
the game!
#+CAPTION: Winning scenario
#+NAME: track-guess-scenario4
#+begin_src feature :tangle no
Scenario: Winning guess
Given a wordle answer
And I guessed 3 times
When I guess the word
And my guess is the answer
Then my guess is scored
And score is perfect
And game shows "Game Won"
#+end_src
A little thought later, it seems we mixed up the requirements a little here (it
happens!). When designing the Gherkin Feature, we wrote about exhausting the
amounts of guesses, we weren't thinking of win/lose conditions. But when writing
a =play_round= function, it's indeed very relevant, especially since the
existing scenarios covered most of the cases already. Ideally, we could have
added a separate Feature describing winning and losing, and dealt with it
separately. In practice, here, it's simpler to just expand the feature's scope,
even if it means the scope has creeped out a little. This is what real
engineering is about, aiming for perfection, but making compromises to match our
imperfect world where deadlines and tired developers exist.
Let's fill in our winning case test:
#+CAPTION: Winning test
#+NAME: track-guess-test4
#+begin_src python :tangle no
def test_winning_guess_wins():
"""Scenario: Winning guess"""
# Given a wordle answer
answer = "orbit"
# And I guessed 3 times
guess_number = 3
game = WordleGame(answer, guess_number)
# When I guess the word
# And my guess is the answer
guess = answer
result = play_round(guess, game)
# Then my guess is scored
assert result.score is not None, "Guess should be scored"
# And the score is perfect
assert result.score == "🟩🟩🟩🟩🟩"
# And game shows "Game Won
assert result.outcome == WordleMoveOutcome.GAME_WON, "Should have won game"
assert "game won" in result.message.lower()
#+end_src
With all the tests ready, we cobble together a stub for =play_round= to execute
the tests and see them go red.
#+CAPTION: stub for =play_round=, returning failure, to make tests run red
#+NAME: track-guess-impl-dummy
#+begin_src python :tangle no
result = WordleMoveOutcome.GAME_OVER_LOST
return WordleMove(game=game, outcome=result, message="You suck!", score=None)
#+end_src
All right, the tests do fail, right?
#+begin_src shell :tangle no :exports both
poetry run pytest 2>&1 || true
#+end_src
#+RESULTS:
#+begin_example
============================= test session starts ==============================
platform linux -- Python 3.9.5, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collecting ... collected 21 items
tests/test_checking_guess_valid_word.py::test_reject_long_words PASSED [ 4%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words PASSED [ 9%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words PASSED [ 14%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words PASSED [ 19%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [ 23%]
tests/test_scoring_guess.py::test_perfect_guess PASSED [ 28%]
tests/test_scoring_guess.py::test_no_common_character PASSED [ 33%]
tests/test_scoring_guess.py::test_wrong_place PASSED [ 38%]
tests/test_scoring_guess.py::test_generic_score[normal_guess1] PASSED [ 42%]
tests/test_scoring_guess.py::test_generic_score[normal_guess2] PASSED [ 47%]
tests/test_scoring_guess.py::test_generic_score[normal_guess3] PASSED [ 52%]
tests/test_scoring_guess.py::test_generic_score[multi_occur1] PASSED [ 57%]
tests/test_scoring_guess.py::test_generic_score[multi_occur2] PASSED [ 61%]
tests/test_scoring_guess.py::test_generic_score[multi_occur3] PASSED [ 66%]
tests/test_scoring_guess.py::test_generic_score[multi_occur4] PASSED [ 71%]
tests/test_scoring_guess.py::test_generic_score[multi_occur5] PASSED [ 76%]
tests/test_scoring_guess.py::test_generic_score[multi_occur6] PASSED [ 80%]
tests/test_track_guess_number.py::test_first_guess_allowed FAILED [ 85%]
tests/test_track_guess_number.py::test_sixth_guess_allowed FAILED [ 90%]
tests/test_track_guess_number.py::test_seventh_guess_fails_game FAILED [ 95%]
tests/test_track_guess_number.py::test_winning_guess_wins FAILED [100%]
=================================== FAILURES ===================================
___________________________ test_first_guess_allowed ___________________________
def test_first_guess_allowed():
"""Scenario: First guess is allowed"""
# Given a wordle answer
answer = "orbit"
# And I didn't guess before
guess_number = 0
game = WordleGame(answer=answer, guess_number=guess_number)
# When I guess the word
guess = "kebab"
result = play_round(guess, game)
# Then my guess is scored
OUTCOME_CONTINUE = WordleMoveOutcome.GUESS_SCORED_CONTINUE
> assert result.outcome == OUTCOME_CONTINUE, "Game shouldn't be over yet"
E AssertionError: Game shouldn't be over yet
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E <WordleMoveOutcome.GAME_OVER_LOST: 1>
E <WordleMoveOutcome.GUESS_SCORED_CONTINUE: 3>
E
tests/test_track_guess_number.py:25: AssertionError
___________________________ test_sixth_guess_allowed ___________________________
def test_sixth_guess_allowed():
"""Scenario: Sixth guess still allowed"""
# Given a wordle answer
answer = "orbit"
# And I guessed 5 times
guess_number = 6
game = WordleGame(answer=answer, guess_number=guess_number)
# When I guess the word
guess = "kebab"
result = play_round(guess, game)
# Then my guess is scored
OUTCOME_CONTINUE = WordleMoveOutcome.GUESS_SCORED_CONTINUE
> assert result.outcome == OUTCOME_CONTINUE, "Game shouldn't be over yet"
E AssertionError: Game shouldn't be over yet
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E <WordleMoveOutcome.GAME_OVER_LOST: 1>
E <WordleMoveOutcome.GUESS_SCORED_CONTINUE: 3>
E
tests/test_track_guess_number.py:46: AssertionError
_________________________ test_seventh_guess_fails_game _________________________
def test_seventh_guess_fails_game():
"""Scenario: Sixth failed guess is game over"""
# Given a wordle answer
answer = "orbit"
# And I guessed 6 times already
# Guessing 6 times BEFORE, using seventh now:
guess_number = 7
game = WordleGame(answer, guess_number)
# When I guess the word
# And my guess isn't the answer
guess = "kebab"
result = play_round(guess, game)
# Then my guess isn't scored
assert result.outcome == WordleMoveOutcome.GAME_OVER_LOST, "Should have lost game"
# But game shows "Game Over"
> assert "game over" in result.message.lower(), "Should show game over message"
E AssertionError: Should show game over message
E assert in failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E game over
E you suck!
E
tests/test_track_guess_number.py:69: AssertionError
___________________________ test_winning_guess_wins ____________________________
def test_winning_guess_wins():
"""Scenario: Winning guess"""
# Given a wordle answer
answer = "orbit"
# And I guessed 3 times
guess_number = 3
game = WordleGame(answer, guess_number)
# When I guess the word
# And my guess is the answer
guess = answer
result = play_round(guess, game)
# Then my guess is scored
> assert result.score is not None, "Guess should be scored"
E AssertionError: Guess should be scored
E assert is not failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E None
E
tests/test_track_guess_number.py:86: AssertionError
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/game.py 20 0 100%
src/literate_wordle/guess.py 19 0 100%
src/literate_wordle/words.py 25 0 100%
------------------------------------------------------------
TOTAL 65 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
=========================== short test summary info ============================
FAILED tests/test_track_guess_number.py::test_first_guess_allowed - Assertion...
FAILED tests/test_track_guess_number.py::test_sixth_guess_allowed - Assertion...
FAILED tests/test_track_guess_number.py::test_sixth_guess_fails_game - Assert...
FAILED tests/test_track_guess_number.py::test_winning_guess_wins - AssertionE...
========================= 4 failed, 17 passed in 0.18s =========================
#+end_example
All right, let's implement this.
** Implementing the feature
First, if we have too many guesses already (before this one), we return game
lost. This means we decide to fail not at the end of the failed sixth guess, but
beginning of the seventh.
#+CAPTION: Game over detection. Notice the early-exit pattern.
#+NAME: track-guess-impl1
#+begin_src python :tangle no
if game.guess_number >= 7:
message = f"Too many guesses: Game Over. Answer was: {game.answer}"
outcome = WordleMoveOutcome.GAME_OVER_LOST
return WordleMove(game=game, outcome=outcome, message=message, score=None)
#+end_src
In order to count a guess, it needs to be a valid word. This means importing
some of our package's functions.
#+NAME: track-guess-import-module-tooshort
#+begin_src python :tangle no
from literate_wordle.guess import score_guess
from literate_wordle.words import check_valid_word
#+end_src
As we write the code to check if guess is valid word, we notice that if the word
isn't valid, we can't return =GUESS_SCORED_CONTINUE=, because an invalid-word
guess shouldn't be counted against the player! So we again revise the
=WordleMoveOutcome= enum and because it's a new enum case, we will need to add a
test for it to cover all grounds! Let's put a pin in that, finish implementing
this first.
#+NAME: track-guess-enum4
#+CAPTION: Fourth outcome: guess wasn't valid, not counted
#+begin_src python :tangle no
GUESS_NOTVALID_CONTINUE = 4
#+end_src
To compensate for having this enum defined all out of order, we'll use again the
=noweb= feature to weave code back in the enum, in the subsection below,
inserting this fourth possibility in the correct place, so the code looks like
it should.
#+CAPTION: Invalid word detection.
#+NAME: track-guess-impl2
#+begin_src python :tangle no
valid, validity_msg = check_valid_word(guess)
if not valid and validity_msg is not None:
outcome = WordleMoveOutcome.GUESS_NOTVALID_CONTINUE
return WordleMove(game=game, outcome=outcome, message=validity_msg, score=None)
#+end_src
Now we've gotten rid of the cases where the guess was invalid.
#+CAPTION: Guess is now valid, count it
#+NAME: track-guess-impl3
#+begin_src python :tangle no
# Guess now guaranteed to be valid: count it
game.guess_number += 1
#+end_src
#+CAPTION: The now-guaranteed-valid guess should be scored
#+NAME: track-guess-impl4
#+begin_src python :tangle no
score = score_guess(guess, game.answer)
#+end_src
#+CAPTION: Knowing a valid guess, scored, is game won?
#+NAME: track-guess-impl5-hardcoded
#+begin_src python :tangle no
if score == "🟩🟩🟩🟩🟩":
outcome = WordleMoveOutcome.GAME_WON
message = f"Correct! Game won in {game.guess_number - 1} guesses"
return WordleMove(game=game, outcome=outcome, message=message, score=score)
#+end_src
Hmm, but wouldn't it be nice to avoid this hardcoded blob?
Let's extend the =CharacterScore= to give this.
#+CAPTION: A property of the class that returns The Perfect Score
#+NAME: track-guess-perfectscore
#+begin_src python :tangle no
@classmethod
@property
def perfect_score(cls) -> str:
"""All-good Wordle score for perfect guess"""
return "".join([cls.OK] * 5)
#+end_src
#+CAPTION: Knowing a valid guess, scored, is game won? (Without hardcoded string)
#+NAME: track-guess-impl5
#+begin_src python :tangle no
if score == CharacterScore.perfect_score:
outcome = WordleMoveOutcome.GAME_WON
message = f"Correct! Game won in {game.guess_number - 1} guesses"
return WordleMove(game=game, outcome=outcome, message=message, score=score)
#+end_src
#+CAPTION: Re-do the imports, because we needed CharacterScore too
#+NAME: track-guess-import-module
#+begin_src python :tangle no
from literate_wordle.guess import CharacterScore, score_guess
from literate_wordle.words import check_valid_word
#+end_src
#+CAPTION: Last possibility remains: scored, not won/lost, try another guess
#+NAME: track-guess-impl6
#+begin_src python :tangle no
# Only case left is "try another guess"
outcome = WordleMoveOutcome.GUESS_SCORED_CONTINUE
message = f"Try again! Guess number {game.guess_number - 1}. Score is: {score}"
return WordleMove(game=game, outcome=outcome, message=message, score=score)
#+end_src
Note that throughout this codebase, we made a lot of assumptions and
repetitions around the length of a Wordle answer/guess, and this translate to
repeated hardcoded-ness like above regarding emojis. These could have been
addressed right away during implementation, and indeed we did, but it's
important to consider if the scope increase is worth it: generalized Wordle to
N characters isn't super interesting to me, as it would require cutting new
dictionaries, etc, and I'm just not that into Wordle. This is the kind of
technical design decision we can do by having a firm grasp on project scope,
another advantage of deep understanding of project requirements.
Back to the implementation: tests should all pass now, =make= is happy, but
there's an interesting issue:
#+begin_example
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 1 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/game.py 38 2 95%
src/literate_wordle/guess.py 19 0 100%
src/literate_wordle/words.py 25 0 100%
------------------------------------------------------------
TOTAL 83 2 98%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
#+end_example
We lowered coverage, nooo! Exploring the coverage HTML file in a browser, we see
that the lines in question that aren't covered are:
#+CAPTION: Lines of code that weren't hit in any test, as witnessed by test coverage library
#+begin_src python :tangle no
if not valid and validity_msg is not None:
outcome = WordleMoveOutcome.GUESS_NOTVALID_CONTINUE
return WordleMove(game=game, outcome=outcome, message=validity_msg, score=None)
#+end_src
Oh! That's the test case we put a pin in! Right, so we're back to writing that
test. I wonder if we should write a whole scenario to back it up? It's not
really obvious!
If this test case spins out of an edge case of our implementation, it's not
really coming from a business requirement, so it's probably not worth writing a
Gherkin Scenario alongside the other ones. If it is indeed an overlooked
requirement, then yes, add it to the requirements pile and write a feature.
Hmm, let's write the test first, and see if the scenario that emerges is a
requirement.
#+NAME: track-guess-test5
#+begin_src python :tangle no
def test_invalid_guess_not_counted():
"""Scenario: Invalid guess isn't counted"""
# Given a wordle answer
answer = "orbit"
# And I guessed 3 times
guess_number = 3
game = WordleGame(answer=answer, guess_number=guess_number)
# When I guess the word
# But my guess isn't a dictionary word
guess = "xolfy"
result = play_round(guess, game)
# Then my guess is rejected as invalid word
OUTCOME_BADWORD = WordleMoveOutcome.GUESS_NOTVALID_CONTINUE
assert result.outcome == OUTCOME_BADWORD, "Guess should have been rejected"
# And my guess is not scored
assert result.score is None, "No score should be given on bad word"
#+end_src
Hmm, after some thought, it seems that the function we implemented, compared to
the feature being described in Gherkin, is indeed different!
As mentioned before, the Gherkin feature was about tracking specific number of
guesses, but we increased scope to consider the wider win scenario, using the
"play round" feature. Expanding the feature again to cover more cases than just
how many guesses, it needs to understand if the guess is correct word or not.
So for the specific purpose of tracking guesses as a feature, we're already
covered by existing scenarios. But not only are we missing edge cases of
implementation, as we saw in coverage metrics, but this is the wider feature
that a /play a round/ Feature would cover.
This game's implementation being so very near completion, I am not interested
in creating another feature file, I'll just expand a bit the original feature
to be about being able to play a whole round, wins and losses included, just to
keep this narrative barely on track.
#+NAME: track-guess-feat2
#+begin_src feature :tangle no
Feature: Playing a round
As a Wordle game
I need to track how many guesses were already given, stating wins/losses
In order to play the game
#+end_src
#+CAPTION: New scenario
#+NAME: track-guess-scenario5
#+begin_src feature :tangle no
Scenario: Invalid guess isn't counted
Given a wordle answer
And I guessed 3 times
When I guess the word
But my guess isn't a dictionary word
Then my guess is rejected as invalid word
And my guess is not scored
#+end_src
And with this new test, we're back to passing tests and 100% coverage!
** Tangling out the whole thing
The feature first:
#+begin_src feature :tangle features/track_guesses.feature :noweb yes
<<track-guess-feat2>>
<<track-guess-scenario1>>
<<track-guess-scenario2>>
<<track-guess-scenario3>>
<<track-guess-scenario4>>
<<track-guess-scenario5>>
#+end_src
The tests:
#+begin_src python :tangle tests/test_track_guess_number.py :noweb yes
<<track-guess-test-docs>>
<<track-guess-test-import>>
<<track-guess-test1>>
<<track-guess-test2>>
<<track-guess-test3>>
<<track-guess-test4>>
# Case covered by existing gherkin feature:
# Intentional, see wordle.org for reasoning
<<track-guess-test5>>
#+end_src
#+begin_src python :tangle src/literate_wordle/game.py :noweb yes
"""Wordle game's state and playing rounds"""
<<track-guess-import-dataclass>>
<<track-guess-import-module>>
<<track-guess-gamestate1>>
<<track-guess-enum4>>
<<track-guess-gamestate2>>
<<track-guess-proto>>
<<track-guess-impl1>>
<<track-guess-impl2>>
<<track-guess-impl3>>
<<track-guess-impl4>>
<<track-guess-impl5>>
<<track-guess-impl6>>
#+end_src
And remember that we had to expand the =CharacterScore=, so we need to re-tangle
it here:
#+NAME: scoring-impl-tangleweb2
#+begin_src python :tangle no :noweb yes
<<scoring-guessmod-header>>
<<scoring-guessfunc-import>>
<<scoring-guess-enum-import>>
<<scoring-guess-enum>>
<<track-guess-perfectscore>>
<<scoring-guessfunc-impl2>>
#+end_src
* Final round: command line interface
We have assembled lego bricks into an almost finished product, as we have enough to
play a single round. Let's give this project a shell command to invoke, tying
together all the other disjointed features.
#+CAPTION: Final feature of our system: play the game!
#+begin_src feature :tangle features/command_line_entrypoint.feature
Feature: Pywordle shell command
As a Wordle game
I need a shell command to launch the game
In order to give convenient entrypoint for players
#+end_src
I don't think it's necessary to give specific scenarios, because we've
thoroughly tested the underlying implementation of the game, we just need to
assemble it into a shell command.
So let's define an entrypoint for the game, generating a new one:
#+CAPTION: Create a new game (object)
#+NAME: cli-main1
#+begin_src python :tangle no
def new_game() -> WordleGame:
"""Generate a new WordleGame"""
return WordleGame(answer=pick_answer_word(), guess_number=1)
#+end_src
And how to play until we lose, printing to stdout as we go:
#+CAPTION: The main game loop. Notice the trick: dependency injection of all input and output as "callables", which we can inject later to be =print= and =input=, or automate instead for testing.
#+NAME: cli-main2
#+begin_src python :tangle no
def play_game(game: WordleGame, guess_fetcher: Callable, response_logger: Callable):
"""Plays the given WordleGame until completion.
Asks guess_fetcher for guess, and sends response to response_logger
"""
outcome = WordleMoveOutcome.GUESS_SCORED_CONTINUE # Gotta start somehow
while outcome not in {WordleMoveOutcome.GAME_WON, WordleMoveOutcome.GAME_OVER_LOST}:
guess = guess_fetcher()
result = play_round(guess=guess, game=game)
response_logger(result.message)
game = result.game
outcome = result.outcome
#+end_src
Pepper in the few imports we need:
#+CAPTION: Callable is the type of anything that can be called (function but also classes with =__init__= methods...)
#+NAME: cli-main-import-std
#+begin_src python :tangle no
from typing import Callable
#+end_src
#+CAPTION: Other functions we referenced
#+NAME: cli-main-import-mod
#+begin_src python :tangle no
from literate_wordle.game import WordleGame, WordleMoveOutcome, play_round
from literate_wordle.words import pick_answer_word
#+end_src
Now we can add command line argument parsing in a separate file:
#+CAPTION: An argument parser with no argument, to give us usage via =--help= flags. Notice the =raw_args=, if passed, is given, and defaults to =None=. This allows us to call this function ourselves to test it, but still use =None= to use =sys.argv= automatically.
#+NAME: cli-pargs1
#+begin_src python :tangle no
def parse_args(raw_args: Optional[Sequence[str]] = None) -> argparse.Namespace:
"""Parse given command line arguments"""
description = "Wordle implementation in Python, as literate programming"
# Bit overkill since there is no real argument to parse yet
parser = argparse.ArgumentParser(prog="pywordle", description=description)
return parser.parse_args(raw_args)
#+end_src
#+CAPTION: The Python arguments parsing library. =click= is fancier, but this is neat and self-contained.
#+NAME: cli-pargs-import-std1
#+begin_src python :tangle no
import argparse
#+end_src
#+CAPTION: Typing hints for our arguments. =Sequence= is a generalized version of List that is "anything we can iterate over", so a linked list would match.
#+NAME: cli-pargs-import-std2
#+begin_src python :tangle no
from typing import Optional, Sequence
#+end_src
#+CAPTION: Inject the input and output functions to =play_game=
#+NAME: cli-pargs2
#+begin_src python :tangle no
def play_game_args(raw_args: Optional[Sequence[str]] = None):
"""Play a standard Wordle game from stdin to stdout, given args"""
_ = parse_args(raw_args)
game = new_game()
play_game(game=game, guess_fetcher=input, response_logger=print)
#+end_src
#+CAPTION: Defining an explicit function for the python package to enter through
#+NAME: cli-pargs3
#+begin_src python :tangle no
def main():
"""Pass sys.argv to the play_game_args function"""
play_game_args(sys.argv[1:])
#+end_src
#+CAPTION: More stdlib imports
#+NAME: cli-pargs-import-std3
#+begin_src python :tangle no
import sys
#+end_src
#+CAPTION: More custom imports
#+NAME: cli-pargs-import-mod
#+begin_src python :tangle no
from literate_wordle.main import new_game, play_game
#+end_src
Since both our main and cli are meant to be untestable, because it's the
interactive entrypoint, it's a bit unfair to compute coverage over it. Let's
blacklist these two files, preventing them weighing down coverage metric.
#+begin_src conf :tangle .coveragerc
[run]
omit =
# Don't compute coverage for these 2 manual invocation files
src/literate_wordle/main.py
src/literate_wordle/cli.py
#+end_src
** Tangling it out
#+begin_src python :tangle src/literate_wordle/main.py :noweb yes
"""Entrypoint for pywordle"""
<<cli-main-import-std>>
<<cli-main-import-mod>>
<<cli-main1>>
<<cli-main2>>
<<cli-main3>>
#+end_src
#+begin_src python :tangle src/literate_wordle/cli.py :noweb yes
"""Command line entrypoint for pywordle"""
<<cli-pargs-import-std1>>
<<cli-pargs-import-std3>>
<<cli-pargs-import-std2>>
<<cli-pargs-import-mod>>
<<cli-pargs1>>
<<cli-pargs2>>
<<cli-pargs3>>
#+end_src
** Launching as CLI
In Python, when using [[https://python-poetry.org/][Poetry]] like we are, the package is defined in
=pyproject.toml=. To define a new command, this means using the
=tool.poetry.script= key:
#+begin_src conf :tangle no
[tool.poetry.scripts]
pywordle = "literate_wordle.cli:main"
#+end_src
So we can now manually invoke this tool. And for the given argument parser, a
help message should be available:
#+CAPTION: Invoking the help menu. Note "poetry run" prefix to ensure we're running inside virtualenv, where the package is installed
#+begin_src shell :tangle no :exports both
poetry run pywordle --help
#+end_src
#+RESULTS:
: usage: pywordle [-h]
:
: Wordle implementation in Python, as literate programming
:
: optional arguments:
: -h, --help show this help message and exit
And we can play a round!
: $ poetry shell
: $ pywordle
: hello
: Try again! Guess number 1. Score is: ⬜🟨🟨⬜🟨
: lobes
: Try again! Guess number 2. Score is: 🟨🟩⬜🟩⬜
: cranes
: Guess too long
: crane
: Try again! Guess number 3. Score is: ⬜⬜⬜🟨🟨
: novel
: Correct! Game won in 4 guesses
Taking a step back, we've got command line launch of the game, and we can play
with it. We're done here, especially for a short experimental project.
But if this codebase was to be maintained, extended, reused, the bar for
"acceptable" test coverage would be much higher.
For instance, we have no test overall on the game loop of guess input/output,
despite all the layers below being pretty well covered. So I'd want tests that
call the =play_game= function with scripted inputs and log the outputs, taking
advantage of the dependency injection we set up to make proper UI-oriented
tests. These would reveal, for instance, that when launching the game, there is
nothing greeting us, no prompt for a guess, which is a usability issue.
In our case, that's an exercise left for the reader.
Remember that testing's primary goal is to increase our trust in the system we
build.
In that vein, because we've got feature acceptance tests covered for every
layer, the biggest source of uncertainty in the system is the implementation
itself: we're just not shaking out the code very much, beyond what a normal
usage would look like. This calls for exploring the edge cases that code may
have, regardless of intended features. Every string parameter should be tried
with empty string, uppercase vs lowercase, different encoding, etc.
* Conclusion
We just walked through building a simple wordle program from scratch, using
literate programming to weave a novel's worth of explanations and reasoning,
with code blocks that export to the proper project code locations.
The project uses modern Python tooling (poetry, pytest) and uses formatters
(black, isort), linters (flake8 with plugins), type checkers (mypy), and the
project generates its own general documentation (including this page, if you're
reading it in a browser) and API reference (Sphinx with myst_parser for Markdown
support), enforcing compliance of every tool via make and pre-commit.
The code was written in a Test-driven (TDD) way, as the tests always came before
the feature itself, guiding how the implementation looks like, all the way to
having 100% test coverage (whatever that means).
More importantly in my eyes, we only built what was strictly necessary, by using
Behaviour-driven development (BDD, also called acceptance-test-driven
development) to guide what subfeature to build next based on our needs. These
specifications were encoded as Gherkin Features, available in a dedicated
=features/= folder, and thanks to the magic of Sphinx documentation, each of
those are collected into a list of requirements in a dedicated Requirements
page of the docs.
Since all of the feature files have associated acceptance tests that match the
phrasing of the Gherkin features, future automation work could look at linking
the requirements in Sphinx to the associated test file, so as to finally get
full traceability from requirements, through specifications, to implementation
and finally acceptance tests that pass.
This project was my first foray into literate programming at this scale, an
attempt to bring together all the good ideas of TDD, modern Python development,
Gherkin usage for requirements traceability purposes (without overly zealous
extremes of Cucumber automation). All these ideas were until now scattered,
implemented each without the others in different places, and this project
fuses them into something I hope is more valuable than the sum of its parts.
If you like what you see here, have a look at my other writings, available on
my blog: https://jiby.tech.
* Post-script: scoring bug
A few weeks after initial release of the project, reader [[https://github.com/gpiancastelli][@gpiancastelli]]
helpfully [[https://github.com/OverkillGuy/literate-wordle/issues/1][reported a major bug relating to guess scoring via Github]]. In this
post-script note, I want to report here the process of investigating the bug,
present how dissecting the issue made the fix emerge, and reflect on how such a
bug could sneak in despite our careful approach.
I'm painfully aware of the ironic (and embarassing) aspect of writing a whole
novel about "programming using best practices" only to get such a crucial point
very, very wrong. It would be easy to hide this bug, retroactively change
the narrative above, and pretend we got it right the first time. Instead, I
believe there's a lesson worth learning and sharing in there.
** The bug report
The [[https://github.com/OverkillGuy/literate-wordle/issues/1#issue-1244097196][original bug report]] states (slightly abridged):
#+begin_quote
There's a bug in your score_guess function. If the guess contains two copies of
a letter, and that letter is present only once in the answer, and the second
copy in the guess matches that letter in the answer, the first copy will be
marked as WRONG_PLACE, while the second copy will be marked as NO.
[...] Let's say we have =A__A_= as our guess and =___A_= as the answer.
Your score_guess function will return =🟨__⬜_= instead of =⬜__🟩_=.
#+end_quote
Incorrect scoring function sounds very serious indeed, so the first step is
confirming the issue with a good testcase. Can we find words that match the
rule:
#+CAPTION: Search for words matching the bug report, arbitrarily using letter =N=
#+begin_src shell :tangle no
# Pick an answer word ending with "n"
zgrep -iE "n\b" ./src/literate_wordle/assets/wordle_answers_dict.txt.gz
# Pick a guess-word ending with "n", and with another "n"
zgrep -iE "n.*n\b" ./src/literate_wordle/assets/wordle_accepted_words_dict.txt.gz
#+end_src
From the many results (those regular expressions are fairly vague), I manually
chose the answer =train= and the guess =xenon=.
We want to show that =score_guess= is wrong, which is best done by adding a
case to =test_generic_score=:
#+CAPTION: Adding a test case at the end, named =multi_occur_issue1=
#+NAME: scoring-multi-parameters2
#+begin_src python :tangle no
@pytest.mark.parametrize(
"answer,our_guess,expected_score",
[
pytest.param("adage", "adobe", "🟩🟩⬜⬜🟩", id="normal_guess1"),
pytest.param("serif", "quiet", "⬜⬜🟨🟨⬜", id="normal_guess2"),
pytest.param("raise", "radix", "🟩🟩⬜🟨⬜", id="normal_guess3"),
pytest.param("abbey", "kebab", "⬜🟨🟩🟨🟨", id="multi_occur1"),
pytest.param("abbey", "babes", "🟨🟨🟩🟩⬜", id="multi_occur2"),
pytest.param("abbey", "abyss", "🟩🟩🟨⬜⬜", id="multi_occur3"),
pytest.param("abbey", "algae", "🟩⬜⬜⬜🟨", id="multi_occur4"),
pytest.param("abbey", "keeps", "⬜🟨⬜⬜⬜", id="multi_occur5"),
pytest.param("abbey", "abate", "🟩🟩⬜⬜🟨", id="multi_occur6"),
pytest.param("train", "xenon", "⬜⬜⬜⬜🟩", id="multi_occur_issue1"),
],
)
#+end_src
Let's run the tests to see the result:
: make test
#+begin_example
poetry run pytest
============ test session starts =============
platform linux -- Python 3.9.5, pytest-7.1.2, pluggy-1.0.0 -- /home/jiby/dev/ws/short/literate_wordle/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/jiby/dev/ws/short/literate_wordle, configfile: pyproject.toml
plugins: cov-3.0.0, clarity-1.0.1
collected 23 items
tests/test_checking_guess_valid_word.py::test_reject_long_words PASSED [ 4%]
tests/test_checking_guess_valid_word.py::test_reject_overly_short_words PASSED [ 8%]
tests/test_checking_guess_valid_word.py::test_reject_nondict_words PASSED [ 13%]
tests/test_checking_guess_valid_word.py::test_accept_dict_words PASSED [ 17%]
tests/test_pick_word.py::test_pick_word_ok_length PASSED [ 21%]
tests/test_scoring_guess.py::test_perfect_guess PASSED [ 26%]
tests/test_scoring_guess.py::test_no_common_character PASSED [ 30%]
tests/test_scoring_guess.py::test_wrong_place PASSED [ 34%]
tests/test_scoring_guess.py::test_generic_score[normal_guess1] PASSED [ 39%]
tests/test_scoring_guess.py::test_generic_score[normal_guess2] PASSED [ 43%]
tests/test_scoring_guess.py::test_generic_score[normal_guess3] PASSED [ 47%]
tests/test_scoring_guess.py::test_generic_score[multi_occur1] PASSED [ 52%]
tests/test_scoring_guess.py::test_generic_score[multi_occur2] PASSED [ 56%]
tests/test_scoring_guess.py::test_generic_score[multi_occur3] PASSED [ 60%]
tests/test_scoring_guess.py::test_generic_score[multi_occur4] PASSED [ 65%]
tests/test_scoring_guess.py::test_generic_score[multi_occur5] PASSED [ 69%]
tests/test_scoring_guess.py::test_generic_score[multi_occur6] PASSED [ 73%]
tests/test_scoring_guess.py::test_generic_score[multi_occur_issue1] FAILED [ 78%]
tests/test_track_guess_number.py::test_first_guess_allowed PASSED [ 82%]
tests/test_track_guess_number.py::test_sixth_guess_allowed PASSED [ 86%]
tests/test_track_guess_number.py::test_seventh_guess_fails_game PASSED [ 91%]
tests/test_track_guess_number.py::test_winning_guess_wins PASSED [ 95%]
tests/test_track_guess_number.py::test_invalid_guess_not_counted PASSED [100%]
================== FAILURES ==================
___ test_generic_score[multi_occur_issue1] ___
answer = 'train', our_guess = 'xenon'
expected_score = '⬜⬜⬜⬜🟩'
@pytest.mark.parametrize(
"answer,our_guess,expected_score",
[
pytest.param("adage", "adobe", "🟩🟩⬜⬜🟩", id="normal_guess1"),
pytest.param("serif", "quiet", "⬜⬜🟨🟨⬜", id="normal_guess2"),
pytest.param("raise", "radix", "🟩🟩⬜🟨⬜", id="normal_guess3"),
pytest.param("abbey", "kebab", "⬜🟨🟩🟨🟨", id="multi_occur1"),
pytest.param("abbey", "babes", "🟨🟨🟩🟩⬜", id="multi_occur2"),
pytest.param("abbey", "abyss", "🟩🟩🟨⬜⬜", id="multi_occur3"),
pytest.param("abbey", "algae", "🟩⬜⬜⬜🟨", id="multi_occur4"),
pytest.param("abbey", "keeps", "⬜🟨⬜⬜⬜", id="multi_occur5"),
pytest.param("abbey", "abate", "🟩🟩⬜⬜🟨", id="multi_occur6"),
pytest.param("train", "xenon", "⬜⬜⬜⬜🟩", id="multi_occur_issue1"),
],
)
def test_generic_score(answer, our_guess, expected_score):
"""Scenario Outline: Scoring guesses"""
# Given a wordle <answer>
# When scoring <guess>
score = score_guess(our_guess, answer)
# Then score should be <score>
> assert score == expected_score
E assert == failed. [pytest-clarity diff shown]
E
E LHS vs RHS shown below
E
E ⬜⬜🟨⬜⬜
E ⬜⬜⬜⬜🟩
E
tests/test_scoring_guess.py:68: AssertionError
- generated xml file: /home/jiby/dev/ws/short/literate_wordle/test_results/results.xml -
----------- coverage: platform linux, python 3.9.5-final-0 -----------
Name Stmts Miss Cover
------------------------------------------------------------
src/literate_wordle/__init__.py 0 0 100%
src/literate_wordle/assets/__init__.py 0 0 100%
src/literate_wordle/game.py 38 0 100%
src/literate_wordle/guess.py 25 0 100%
src/literate_wordle/words.py 32 0 100%
------------------------------------------------------------
TOTAL 95 0 100%
Coverage HTML written to dir test_results/coverage.html
Coverage XML written to file test_results/coverage.xml
========== short test summary info ===========
FAILED tests/test_scoring_guess.py::test_generic_score[multi_occur_issue1]
======== 1 failed, 22 passed in 0.15s ========
make: *** [Makefile:16: test] Error 1
#+end_example
Bug confirmed! Whoops.
** Why are we scoring badly?
If necessary, we can step through the example code to figure out what's wrong,
and I did. But overall, it seems that our approach to scoring by looking at
character in a single pass is at fault.
The approach falls down with the example we were given, because we don't /first/
detect the second =n= of =xenon= as matching the last =n= of =train=, which
would make it scored OK (🟩), then in another pass detecting remaining,
non-matching (⬜) in the first =n=. Instead, we run over characters in order,
detect a =n= in the wrong place, score it as wrong-place (🟨), and by decreasing
the occurence counter, the next one is counted non-matching (⬜), hence the bad
score.
Thinking it through, it means that the single-pass scoring approach just /cannot/ work, as we /need/ to
"look ahead", knowing already the OK-ness of all guess characters /before/
scoring the wrong-place-ness. Interesting!
So we will re-write this algorithm to work in two passes: First, detect exact
matches of guess/answer character pairs, recording those as perfect score. Then,
a second pairwise check looks for wrong-place score, defaulting to the mismatch "zero" score.
** Fixing the issue
In order to score "out of order" (in two passes), the =response= needs to change
from the original empty string being built, to some random-access structure: a
=list=.
In designing the fix, we realise that a zero score, aka all-mismatch
(⬜⬜⬜⬜⬜) is the "default" case of scoring. That is we "start" from that
score, and score "up" by marking individual characters as matching.
We reflect that in the list initialisation, starting with the worst score as it
means we avoid having to "detect" it anymore. That's a tiny optimization of the
code. But more importantly, this list is now randomly accessible, as we can now
"peek ahead" when we couldn't before.
#+CAPTION: Bug fixed, solution in two passes, defaulting score to NO
#+NAME: scoring-guessfunc-impl3
#+begin_src python :tangle no
def score_guess(guess: str, answer: str) -> str:
"""Score an individual guess with Counter"""
# Counter("abbey") = Counter({'b': 2, 'a': 1, 'e': 1, 'y': 1})
answer_chars = Counter(answer)
# NO is the default score, no need to detect it explicitly
response: list[str] = [CharacterScore.NO] * len(answer)
# First pass to detect perfect scores
for char_index, (answer_char, guess_char) in enumerate(zip(guess, answer)):
if answer_char == guess_char:
response[char_index] = CharacterScore.OK
answer_chars[guess_char] -= 1
# Second pass for the yellows
for char_num, (guess_char, existing_score) in enumerate(zip(guess, response)):
if existing_score == CharacterScore.OK:
continue # It's already green: skip
if answer_chars[guess_char] > 0:
response[char_num] = CharacterScore.WRONG_PLACE
# Reduce occurence counter since we "used" this occurence
answer_chars[guess_char] -= 1
return "".join(response)
#+end_src
Note another minor change, we removed the check for =guess_char in
answer_chars=. This was previously there to catch the case where the
=answer_chars= dictionary didn't have an entry for this =guess_char=, which
meant trying to access it would raise a =KeyError=, so we'd protect agaisnt
that.
But [[https://github.com/OverkillGuy/literate-wordle/issues/1#issuecomment-1156974685=][as @gpiancastelli also pointed out]], a =collections.Counter= isn't a regular
dictionary, [[https://docs.python.org/3/library/collections.html#collections.Counter][the documentation says:]] "Counter objects have a dictionary interface
except that they return a zero count for missing items.". This helpful
divergence from regular dictionaries protects us already from that missing key
issue, so the code can flow just a little more smoothly.
Had this been a raw dict, not a =Counter=, we could have used the =get= operator
to set a default value on missing key, in the form =answer_chars.get(guess_char,
0)=. We'd be trading off clarity for briefness. Not as elegant as what =Counter=
allows!
Still, the bug is fixed, as attested by tests going green again. We also check
linters are happy and coverage is good (they are, it is). All is well!
** Tangling again
We just re-defined (overwrote) a few code blocks from previous sections, so we need
to re-weave them together into a real file.
If we just "fixed" the tangling blocks of above, the story wouldn't be in order,
wouldn't make sense.
So we redefine a few files here:
#+NAME: scoring-scenario-multi-identicalanswerchar2
#+CAPTION: Adding bug =xenon= and =trains= to examples in documentation
#+begin_src feature :tangle features/scoring_guess.feature
Examples: Reported bug: multiple occurence of same character in guess
| answer | guess | score |
| train | xenon | ⬜⬜⬜⬜🟩 |
#+end_src
#+NAME: scoring-test1-tangleweb2
#+CAPTION: =tests/test_scoring_guess.py= with our extra testcase at the bottom
#+begin_src python :tangle tests/test_scoring_guess.py :noweb yes
"""Validates the Gherkin file features/scoring_guess.feature:
<<scoring-feature>>
"""
<<scoring-test-import-pytest>>
<<scoring-test-import>>
<<scoring-test1>>
<<scoring-test2>>
<<scoring-test3>>
<<scoring-multi-parameters2>>
<<scoring-multi-skeleton>>
#+end_src
#+CAPTION: Re-defining =src/literate_wordle/guess.py= with our fix
#+NAME: scoring-impl-tangleweb3
#+begin_src python :tangle src/literate_wordle/guess.py :noweb yes
<<scoring-guessmod-header>>
<<scoring-guessfunc-import>>
<<scoring-guess-enum-import>>
<<scoring-guess-enum>>
<<track-guess-perfectscore>>
<<scoring-guessfunc-impl3>>
#+end_src
** Why didn't you catch this earlier? Is it TDD/BDD's fault or are you just a bad dev?
We just found a bug, and fixed it. But why didn't we catch it earlier!? Is TDD
and BDD at fault? Can we just go back to coding without tests!?
I like to think that the process didn't fail as much as my imagination did.
First, note how the Gherkin features, requirements gathering and so on did their
job, we adequately planned for features, defined scenarios that did make sense,
and implemented those correctly. So the BDD side delivered its value!
Purely TDD-wise, all the tests we defined /were/ valid, and covered reasonable
aspect of the features to help design the new functions' shapes, nothing to say
there either.
The failing was in the (lack of) diversity of scores used as examples: we didn't
cover a broad enough set of score samples to find issues like this one.
But finding this bug isn't obvious: if you didn't know about this particular bug
(by reading the sourcecode and seeing a really non-obvious flaw), finding the
bug would instead require playing randomly this game's implementation until you
find a bad score (which could take minutes or hours, due to the randomness
involved), then reproducing the example + reporting it. This is likely what the
bug reporter did, played around and found a bad case.
As a developer, I didn't have any particular reason to suspect this specific scoring
issue, so I didn't develop a test case with it.
But I like to think that I was so close!
As you see in sections above, I /was/ worried about scoring for multiple
letters, as shown in the scoring example table. I remember this being a concern,
because any naive implementation of wordle could miss the nuance of "the real
Worlde". I even broke out screenshots from the real Wordle website to make up
some references, because I couldn't explain to myself how the scoring /should/
happen.
Unfortunately, my attention was on multiple identical characters in the /answer/, not in
the /guess/.
So, again, I was close enough to look for similar bugs, but didn't quite find a
diverse enough set of sample scores to unearth this particular issue.
** Why should this be a failure story?
Before we go, I want to flip the narrative around this bug:
The way I see it, I built a fun implementation of Wordle to play with
Python, TDD and BDD. I spent a reasonable amount of time on "due diligence
research" around edge cases (seen in above section) to feel good about the
solution.
Isolating the bug (by adding a single line to the tests), and fixing it (a few
paragraphs, one function) was a minuscule amount of additional effort, thanks to
our strong test harness.
Avoiding the bug in the first place would have cost a lot more time, doing
research into 100% compatibility with existing Wordle implementations, likely
having to connect someone else's Wordle code to ours to compare (with all the
associated issues to deal with), for comparatively minor benefits.
This isn't NASA, who has /a single chance to send rockets/, and (comparatively)
infinite engineering time to plan it. In our case, the cost of making the system
robust can be prohibitive. The discipline of Engineering is about balancing
acceptable risks against the costs of reliability.
So, despite having to issue a rectification to this narrative, I still believe
the amount of pre-production research was sufficient: We did nothing wrong here.
This bugfix also showcases the iterative nature of software development: Earlier
sections demonstrated feature addition as incremental changes, but we see here
that refining the solution when it's subtly wrong is an iterative process too!
So, yeah, building code to be correct the first time is hard. Or maybe almost
impossible. Or even not the best course of action for you!
The best way to build code is to "make it work, make it right, then make it fast"
/in that order/.