Dedicated to design and performance of databases and audio systems.

Last summer, I came across a free ebook from O'Reilly titled *A Whirlwind Tour of Python* by Jake VanderPlas. I had been looking to get a more fundamental knowledge of Python and found the ebook to be very good, and also portable as a PDF.

My goal was to use Python in its base form--no inclusion of any libraries. No numpy nor pandas; those will come later.

However, I did want to have a mathematical type of problem to solve. For that I chose a Sudoku. A Python program that would solve all levels of difficulty: *easy*, *medium*, *hard*, and *evil*.

The problem has defined bounds and rules. It is a nine-cell-squared grid with nine zones of nine cells in a three-by-three configuration. The values 1 through 9 can only be used once within each column, row, and three-by-three grid. Simple, in a kinda-sorta way. The good news is that the logic is based on Set Theory. And, as SQL-heads we live Set Theory every day.

My goals:

- learn Python fundamentally — Python is a common tool for data science
- applied learning by solving a problem — a mathematical problem with a definitive answer; as a
*basis*of ML - the boundaries are fixed
- there are rules
- you cannot alter the environment

Restrictions:

- must solve each puzzle myself (no cheating)
- no Google except for purely Python syntax, but not Sudoku-related Python

Execution:

- serial; not parallel (using my Mac)
- set-up:
- solved cells
- unsolved cells are populated with an array (i.e. Python: list) of the solution domain values (i.e. 1, 2, 3, … 9)
- pruning & solving
- 3 pruning functions; 2 solution functions
- arranged in a sequence of steps — i.e. second function only executed if first function nets no improvement, etc.
- cyclical: prune to solution, and back to prune
- first levels are simply
- prune: remove values from unsolved cell lists that are solved in related x-axis / y-axis / z-zone solved cells
- solve: if one value remains in an unsolved cell’s list then it is the answer; convert list to integer value

Notable

- Set Theory — similar concepts with SQL and Teradata
- a set is unique (i.e.
`SET TABLE`

) - Set operators —
`INTERSECT`

and`DIFFERENCE`

Using sets in Python is handy for uniqueness. Whether it is an address (i.e. cell number) or a solved value, getting a unique set simplifies us having to perform the deduplication process ourselves. Conversion between a list and a set, and vice-versa, is very simple. Sets allow us to leverage the `INTERSECT`

and `DIFFERENCE`

operators which make the pruning of possible values much easier.

As the puzzles get increasingly more complex, the process needs to execute different functions in order to solve them. The easy level of puzzles only requires a simple pruning of related-axis/zone's solved values from an unsolved cell's possible values. When only one possible value is left, it is the answer. However, getting past specific points in a hard- or evil-level puzzle requires finding the one possibility remaining by comparing possibilities on the same axis across multiple zones, for example.

Anyway, it is somewhat awkward to describe in text. Let's go to the code. In it I have embedded an *easy*, *medium*, *hard*, and two *evil* Sudoku's.

I'm sure there are more optimal ways to write some of this. Here is my take.

The second *evil* level puzzle is active, and its process and solution is below. Note the times the process must use the third pruning function. It's what makes the Sudoku *evil*.

` ````
# INITIALIZE: dimensions and Sudoku
dim = 3 # 3 axes
dim2 = dim**2 # 9 cells per column, row, or zone
dim4 = dim**4 # 81 cells in a Sudoku
dom = [1,2,3,4,5,6,7,8,9] # possible values
# load: Sudoku (easy)
#sud = [2,None,8,None,4,6,None,9,None,None,None,None,1,None,None,6,8,3,1,None,9,8,None,3,None,None,4,None,None,2,3,None,None,1,None,8,4,8,None,5,2,None,None,None,None,3,None,1,None,None,None,2,None,6,6,None,None,4,None,None,8,7,9,None,5,None,None,7,8,None,1,None,None,1,None,None,None,9,None,6,None]
# load: Sudoku (medium)
#sud = [None,5,7,None,None,None,4,None,6,None,None,None,5,None,None,8,1,None,9,None,None,6,None,None,None,None,None,None,None,5,None,9,3,None,None,1,8,4,None,None,6,None,None,3,2,6,None,None,8,7,None,5,None,None,None, None,None,None,None,1,None,None,8,None,7,1,None,None,8,None,None,None,4,None,8,None,None,None,1,7,None]
# load: Sudoku (hard)
#sud = [None,None,4,None,9,None,2,None,None,None,9,None,None,None,2,None,None,1,5,2,None,None,4,None,None,None,None,None,None,None,3,None,None,None,7,6,None,6,5,None,None,None,8,2,None,7,8,None,None,None,4,None,None,None,None,None,None,None,1,None,None,8,9,3,None,None,8,None,None,None,6,None,None,None,8,None,6,None,7,None,None]
# load: Sudoku (evil 1)
sud = [None,None,5,1,7,6,None,None,None,4,None,None,None,None,None,None,None,1,None,None,None,8,None,None,3,None,9,1,None,None,None,None,None,None,9,None,5,None,3,None,None,None,8,None,7,None,9,None,None,None,None,None,None,4,2,None,7,None,None,8,None,None,None,6,None,None,None,None,None,None,None,3,None,None,None,4,2,9,7,None,None]
# load: Sudoku (evil 2)
#sud = [None,None,None,None,None,1,None,None,None,None,4,5,None,None,None,9,None,2,2,None,7,None,None,4,None,None,None,7,None,6,None,1,None,None,None,None,None,5,None,4,None,9,None,8,None,None,None,None,None,5,None,2,None,3,None,None,None,6,None,None,8,None,9,4,None,3,None,None,None,7,5,None,None,None,None,3,None,None,None,None,None]
# CLEANSE: replace None values with answer domain
for _i in range(dim4): # spin through entire Sudoku
if not sud[_i]: # if cell is null [None]
sud[_i] = dom # replace None with list of all possible values (i.e. 1, 2, 3, ... 9)
# ADDRESSES: the x, y, and z axies relative to a specified cell
def xyzl(_i):
_xl, _yl, _zl = ([] for _j in range(dim)) # initialize lists
_xyzs = set() # initialize set
for _j in range(dim2): # nine cells in each x or y plane, or z zone
_xl.append(_i - (_i % dim2) + _j) # relative x-plane addresses
_yl.append((_i % dim2) + (_j * dim2)) # relative y-plane addresses
_zl.append(((_i / dim2 / dim * dim) + (_i % dim2 / dim)) * dim2 - (_i % dim2 / dim * (dim * 2)) + _j + (_j / 3 * dim * 2)) # relative z-zone addresses
_xyzs.update(_xl) # x addresses added to the xyz address set
_xyzs.update(_yl) # y addresses added to the xyz address set
_xyzs.update(_zl) # z addresses added to the xyz address set
_xyzl = list(_xyzs) # xyz address set converted to xyz list
return _xl, _yl, _zl, _xyzl
# SUN: solved, unsolved, and number of possibilities
def sun():
s, u, n = (0 for _i in range(dim)) # initialize solved, unsolved, and possible values metrics
for _i in range(dim4): # spin through entire Sudoku
if type(sud[_i]) is int: # solved cell
s += 1 # solved cell quantity
else: # unsolved cell
u += 1 # unsolved cell quantity
n += len(sud[_i]) # number of possible values in unsolved cells
return s, u, n
# PRUNE 1
# remove the cell's possibilities of solved integer values from the relative x/y/z planes/zone
def prune1():
for _i in range(dim4): # spin through entire Sudoku
if type(sud[_i]) is list: # cell data type is list
_ints = set() # initialize solved integer set
_xyzl = xyzl(_i)[3] # unique set of x/y/z addresses relative to i cell
for _j in _xyzl: # each relative x, y, and z address for the cell
if type(sud[_j]) is int: # solved cell value
_ints.add(sud[_j]) # added to the solved set of integers
sud[_i] = list(set(sud[_i]).difference(_ints)) # remove solved integer values from possibilities list
_sun = sun()
print 'prune 1: ', _sun
return _sun
# PRUNE 2
# remove possibilities from cells that can only exist on another, single x- or y-plane within a z-zone
def prune2():
for _i in range(dim4):
_xf, _yf, _zf = ([0] * (dim2 + 1) for _j in range(dim)) # frequency tables
_x_l, _y_l, _xdl, _ydl = ([] for _j in range(dim + 1)) # addresses & possibility differences
_xyzll = xyzl(_i) # relative x/y/z addresses
_xl = _xyzll[0] # relative x addresses
_yl = _xyzll[1] # relative y addresses
_zl = _xyzll[2] # relative z addresses
_xzl = list(set(_xl).intersection(_zl)) # x addresses within z zone
_yzl = list(set(_yl).intersection(_zl)) # y addresses within z zone
for _x in _xzl: # x addresses within z zone
if type(sud[_x]) is list: # cell data type is list
for _n in sud[_x]: # each possible value in the cell unsolved list
_xf[_n] += 1 # x frequency table
for _y in _yzl: # y addresses within z zone
if type(sud[_y]) is list: # cell data type is list
for _n in sud[_y]: # each possible value in the cell unsolved list
_yf[_n] += 1 # y frequency table
for _z in _zl: # z addresses
if type(sud[_z]) is list: # cell data type is list
for _n in sud[_z]: # each possible value in the cell unsolved list
_zf[_n] += 1 # z frequency table
for _n, _f in enumerate(_zf): # z frequency table
if _xf[_n] and _xf[_n] == _f: # if the x-plane frequency for the possible value equals the z-zone frequency for the same possible value
_x_l = list(set(_xl).difference(_xzl)) # x addesses outside z zone
_xdl.append(_n) # x difference list of possibilities
for _x in _x_l: # x addresses outside of z zone
if type(sud[_x]) is list: # cell data type is list
sud[_x] = list(set(sud[_x]).difference(_xdl)) # cell possibilities minus x differences
for _n, _f in enumerate(_zf): # z frequency table
if _yf[_n] and _yf[_n] == _f: # if the y-plane frequency for the possible value equals the z-zone frequency for the same possible value
_y_l = list(set(_yl).difference(_yzl)) # y addresses outside of z zone
_ydl.append(_n) # y difference list of possibilities
for _y in _y_l: # y addresses outside of z zone
if type(sud[_y]) is list: # cell data type is list
sud[_y] = list(set(sud[_y]).difference(_ydl)) # cell possibilities minus y differences
_sun = sun()
print 'prune 2: ', _sun
return _sun
# PRUNE 3
# if an identical list of two values exists on the same x- or y-plane then their possible values cannot exist in any other list on the same plane
def prune3():
for _i in range(dim4): # spin through entire Sudoku
_xyzll = xyzl(_i)
_xl = _xyzll[0]
_yl = _xyzll[1]
if type(sud[_i]) is list: # unsolved
if len(sud[_i]) == 2: # two possible values
for _x in _xl: # x addresses relative to i cell
if _x > _i: # only progress forward to not be redundant
if type(sud[_x]) is list: # cell data type is list
if set(sud[_i]) == set(sud[_x]): # an identical two-value set on the same plane
for _x2 in _xl: # each x-plane address
if _x2 <> _x and _x2 <> _i and type(sud[_x2]) is list: # cell data type is list
sud[_x2] = list(set(sud[_x2]).difference(sud[_x])) # remove possible values from unsolved cell set that cannot occur
if type(sud[_i]) is list: # unsolved
if len(sud[_i]) == 2: # two possible values
for _y in _yl: # y addresses relative to i cell
if _y > _i: # only progress forward to not be redundant
if type(sud[_y]) is list: # cell data type is list
if set(sud[_i]) == set(sud[_y]): # an identical two-value set on the same plane
for _y2 in _yl: # each y-plane address
if _y2 <> _y and _y2 <> _i and type(sud[_y2]) is list: # cell data type is list
sud[_y2] = list(set(sud[_y2]).difference(sud[_y])) # remove possible values from unsolved cell set that cannot occur
_sun = sun()
print 'prune 3: ', _sun
return _sun
# SOLVE 1:
# if a single possible list value remains, convert it to an integer (i.e. solved)
def solve1():
for _i in range(dim4): # spin through entire Sudoku
if type(sud[_i]) is list: # cell data type is a list
if len(sud[_i]) == 1: # only one possible value left in list
sud[_i] = sud[_i][0] # convert list to integer
_sun = sun()
print 'solve 1: ', _sun
return _sun
# SOLVE 2:
# if a single possible value appears only once within an x- or y-plane or z-zone
def solve2():
for _i in range(dim): # x, y, then z planes/zone
for _j in range(dim4): # spin through entire Sudoku
if type(sud[_j]) is list: # cell data type is list
_fl = [0] * (dim2 + 1) # initialize a frequency table (list)
_xyzll = xyzl(_j) # relative x, y, and z addresses for the cell
for _xyz in _xyzll[_i]: # each x, y, or z address relative to the cell
if type(sud[_xyz]) is list: # relative x, y, or z cell data type is list
for _n in sud[_xyz]: # possible values in list
_fl[_n] += 1 # increment the frequency table
for _n in sud[_j]: # each value in the cell's list
if _fl[_n] == 1: # if a value in the frequency table occurs only once
sud[_j] = _n # replace the cell's list with that integer that only occurs once
_sun = sun()
print 'solve 2: ', _sun
return _sun
_sun = sun()
print 'solve 2: ', _sun
return _sun
# EXECUTE
# initial values: solved & unsolved cell count, and unsolved cell possible values
_sun = sun()
_n = _sun[2]
print '\n(solved, unsolved, possible #)'
print 'start 0: ', _sun
# loop until the sudoku is completely solved
while _n:
# loop through pruning processes until the pruning of possible values has occurred or all pruning processes have been attempted
_p1 = prune1()[2]
if (_n - _p1):
_n = _p1
else:
_p2 = prune2()[2]
if (_n - _p2):
_n = _p2
else:
_p3 = prune3()[2]
if (_n - _p3):
_n = _p3
_s1 = solve1()[2]
if (_n - _s1):
_n = _s1
else:
_s2 = solve2()[2]
if (_n - _s2):
_n = _s2
# print SOLUTION
print '\n\n'
for _i in range(dim2): # spin through each column
for _j in range(dim2): # spin through each row
print sud[_i * dim2 + _j],
if _j % 3 == 2 and _j < (dim2 - 1):
print '|',
else:
print ' ',
if _i % 3 == 2 and _i < (dim2 - 1):
print '\n---------------------------------'
else:
print '\n'
print '\n'
```

Note: Indentation in Python matters. It isn't just formatting, it is syntactically necessary for the logic to be correct. For example, note the two `print`

statements at the end. The first is part of the `if`

block while the latter will be the last statement executed.

The process:

` ````
(solved, unsolved, possible #)
start 0: (26, 55, 495)
prune 1: (26, 55, 195)
solve 1: (27, 54, 194)
prune 1: (27, 54, 190)
solve 1: (28, 53, 189)
prune 1: (28, 53, 183)
solve 1: (29, 52, 182)
prune 1: (29, 52, 179)
solve 1: (30, 51, 178)
prune 1: (30, 51, 177)
solve 1: (30, 51, 177)
solve 2: (31, 50, 174)
prune 1: (31, 50, 173)
solve 1: (31, 50, 173)
solve 2: (32, 49, 168)
prune 1: (32, 49, 168)
prune 2: (32, 49, 158)
solve 1: (33, 48, 157)
prune 1: (33, 48, 147)
solve 1: (33, 48, 147)
solve 2: (34, 47, 143)
prune 1: (34, 47, 143)
prune 2: (34, 47, 142)
solve 1: (34, 47, 142)
solve 2: (35, 46, 137)
prune 1: (35, 46, 137)
prune 2: (35, 46, 137)
prune 3: (35, 46, 134)
solve 1: (36, 45, 133)
prune 1: (36, 45, 133)
prune 2: (36, 45, 127)
solve 1: (36, 45, 127)
solve 2: (37, 44, 123)
prune 1: (37, 44, 123)
prune 2: (37, 44, 120)
solve 1: (37, 44, 120)
solve 2: (38, 43, 116)
prune 1: (38, 43, 116)
prune 2: (38, 43, 112)
solve 1: (39, 42, 111)
prune 1: (39, 42, 109)
solve 1: (40, 41, 108)
prune 1: (40, 41, 100)
solve 1: (40, 41, 100)
solve 2: (41, 40, 98)
prune 1: (41, 40, 98)
prune 2: (41, 40, 94)
solve 1: (42, 39, 93)
prune 1: (42, 39, 93)
prune 2: (42, 39, 93)
prune 3: (42, 39, 92)
solve 1: (43, 38, 91)
prune 1: (43, 38, 91)
prune 2: (43, 38, 91)
prune 3: (43, 38, 91)
solve 1: (43, 38, 91)
solve 2: (44, 37, 88)
prune 1: (44, 37, 88)
prune 2: (44, 37, 87)
solve 1: (44, 37, 87)
solve 2: (45, 36, 85)
prune 1: (45, 36, 84)
solve 1: (46, 35, 83)
prune 1: (46, 35, 81)
solve 1: (47, 34, 80)
prune 1: (47, 34, 76)
solve 1: (48, 33, 75)
prune 1: (48, 33, 73)
solve 1: (49, 32, 72)
prune 1: (49, 32, 71)
solve 1: (50, 31, 70)
prune 1: (50, 31, 68)
solve 1: (52, 29, 66)
prune 1: (52, 29, 60)
solve 1: (55, 26, 57)
prune 1: (55, 26, 49)
solve 1: (62, 19, 42)
prune 1: (62, 19, 34)
solve 1: (70, 11, 26)
prune 1: (70, 11, 16)
solve 1: (76, 5, 10)
prune 1: (76, 5, 5)
solve 1: (81, 0, 0)
```

The solution:

` ````
9 3 5 | 1 7 6 | 4 8 2
4 2 8 | 3 9 5 | 6 7 1
7 1 6 | 8 4 2 | 3 5 9
---------------------------------
1 7 4 | 2 8 3 | 5 9 6
5 6 3 | 9 1 4 | 8 2 7
8 9 2 | 5 6 7 | 1 3 4
---------------------------------
2 4 7 | 6 3 8 | 9 1 5
6 8 9 | 7 5 1 | 2 4 3
3 5 1 | 4 2 9 | 7 6 8
```

As you can see by the log, the Sudoku started off with 26 cells having values, leaving 55 to be solved. Those 55 cells were populated with lists of values 1 through 9. The first pruning pass removed all values from those lists which were solved in the same x- or y-axis, or zone cells. Here the result was the elimination of 300 possible values from 495 to 195. Now the fun begins.

The first solve step finds one unsolved cell with one value left in its list. Now answered the solved total increases to 27 and unsolved decreases to 54, while possible remaining values decreases by 1 to 194.

Now that we have solved a cell it's time to prune the remaining possible values again. This time removing the value we just solved for from all unsolved cells' lists in the same x-axis, y-axis, and zone. Four values removed. Now 190 possibilities remain.

From this point there are times when we fail to prune or solve cells with the easy first level of routines and need to kick it up to the next level. I am often asked, why have *level 1* if *level 2* is more sophisticated? The simple answer is because they are different. *Level 2* does not perform the same routine as *level 1*. *Level 1* performs some simple sweeping while *level 2* is more of the vacuum cleaner. And *level 3*? More like the power washer.

In this scenario, note the two instances when the *prune 3* function is actually used. Twice. Not often. But without it the puzzle could not be solved. The second *prune 3* process only removes **one** possible value. One! From 93 to 92. But that's all it took to break the puzzle open.

You may notice there is a third *prune 3* process. It produced no reduction at all. However, in spinning through the subsequent *solve* routines *solve 2* gained further progress. The reason for this was logically when the previous *solve 1* nets a result the process shifts back to pruning--an upfront decision I had made in the program's design.

Finally, another question I get asked is whether or not the program performs any guessing. No. No guesses are made. It is solely set theory executed in procedural looped stages until complete.

I hope this offers a means to learn some of the syntax and capabilities of Python for those unfamiliar with it.

Cheers

-Brad