b¬äd†ødd

Dedicated to design and performance of databases and audio systems.

Learning Python

Last summer, I came across a free ebook from O'Reilly titled A Whirlwind Tour of Python by Jake VanderPlas. I had been looking to get a more fundamental knowledge of Python and found the ebook to be very good, and also portable as a PDF.

 

My goal was to use Python in its base form--no inclusion of any libraries. No numpy nor pandas; those will come later.

 

However, I did want to have a mathematical type of problem to solve. For that I chose a Sudoku. A Python program that would solve all levels of difficulty: easy, medium, hard, and evil.

 

The problem has defined bounds and rules. It is a nine-cell-squared grid with nine zones of nine cells in a three-by-three configuration. The values 1 through 9 can only be used once within each column, row, and three-by-three grid. Simple, in a kinda-sorta way. The good news is that the logic is based on Set Theory. And, as SQL-heads we live Set Theory every day.

 

My goals:

  • learn Python fundamentally — Python is a common tool for data science
  • applied learning by solving a problem — a mathematical problem with a definitive answer; as a basis of ML
    • the boundaries are fixed
    • there are rules
    • you cannot alter the environment

 

Restrictions:

  • must solve each puzzle myself (no cheating)
  • no Google except for purely Python syntax, but not Sudoku-related Python

 

Execution:

  • serial; not parallel (using my Mac)
  • set-up:
    • solved cells
    • unsolved cells are populated with an array (i.e. Python: list) of the solution domain values (i.e. 1, 2, 3, … 9)
  • pruning & solving
    • 3 pruning functions; 2 solution functions
    • arranged in a sequence of steps — i.e. second function only executed if first function nets no improvement, etc.
    • cyclical: prune to solution, and back to prune
    • first levels are simply
      • prune: remove values from unsolved cell lists that are solved in related x-axis / y-axis / z-zone solved cells
      • solve: if one value remains in an unsolved cell’s list then it is the answer; convert list to integer value

 

Notable

  • Set Theory — similar concepts with SQL and Teradata
    • a set is unique (i.e. SET TABLE)
    • Set operators — INTERSECT and DIFFERENCE

 

Using sets in Python is handy for uniqueness. Whether it is an address (i.e. cell number) or a solved value, getting a unique set simplifies us having to perform the deduplication process ourselves. Conversion between a list and a set, and vice-versa, is very simple. Sets allow us to leverage the INTERSECT and DIFFERENCE operators which make the pruning of possible values much easier.

 

As the puzzles get increasingly more complex, the process needs to execute different functions in order to solve them. The easy level of puzzles only requires a simple pruning of related-axis/zone's solved values from an unsolved cell's possible values. When only one possible value is left, it is the answer. However, getting past specific points in a hard- or evil-level puzzle requires finding the one possibility remaining by comparing possibilities on the same axis across multiple zones, for example.

 

Anyway, it is somewhat awkward to describe in text. Let's go to the code. In it I have embedded an easy, medium, hard, and two evil Sudoku's.

 

I'm sure there are more optimal ways to write some of this. Here is my take.

 

The second evil level puzzle is active, and its process and solution is below. Note the times the process must use the third pruning function. It's what makes the Sudoku evil.

 

            	
# INITIALIZE: dimensions and Sudoku
dim = 3 # 3 axes
dim2 = dim**2 # 9 cells per column, row, or zone
dim4 = dim**4 # 81 cells in a Sudoku
dom = [1,2,3,4,5,6,7,8,9] # possible values

# load: Sudoku (easy)
#sud = [2,None,8,None,4,6,None,9,None,None,None,None,1,None,None,6,8,3,1,None,9,8,None,3,None,None,4,None,None,2,3,None,None,1,None,8,4,8,None,5,2,None,None,None,None,3,None,1,None,None,None,2,None,6,6,None,None,4,None,None,8,7,9,None,5,None,None,7,8,None,1,None,None,1,None,None,None,9,None,6,None]

# load: Sudoku (medium)
#sud = [None,5,7,None,None,None,4,None,6,None,None,None,5,None,None,8,1,None,9,None,None,6,None,None,None,None,None,None,None,5,None,9,3,None,None,1,8,4,None,None,6,None,None,3,2,6,None,None,8,7,None,5,None,None,None, None,None,None,None,1,None,None,8,None,7,1,None,None,8,None,None,None,4,None,8,None,None,None,1,7,None]

# load: Sudoku (hard)
#sud = [None,None,4,None,9,None,2,None,None,None,9,None,None,None,2,None,None,1,5,2,None,None,4,None,None,None,None,None,None,None,3,None,None,None,7,6,None,6,5,None,None,None,8,2,None,7,8,None,None,None,4,None,None,None,None,None,None,None,1,None,None,8,9,3,None,None,8,None,None,None,6,None,None,None,8,None,6,None,7,None,None]

# load: Sudoku (evil 1)
sud = [None,None,5,1,7,6,None,None,None,4,None,None,None,None,None,None,None,1,None,None,None,8,None,None,3,None,9,1,None,None,None,None,None,None,9,None,5,None,3,None,None,None,8,None,7,None,9,None,None,None,None,None,None,4,2,None,7,None,None,8,None,None,None,6,None,None,None,None,None,None,None,3,None,None,None,4,2,9,7,None,None]

# load: Sudoku (evil 2)
#sud = [None,None,None,None,None,1,None,None,None,None,4,5,None,None,None,9,None,2,2,None,7,None,None,4,None,None,None,7,None,6,None,1,None,None,None,None,None,5,None,4,None,9,None,8,None,None,None,None,None,5,None,2,None,3,None,None,None,6,None,None,8,None,9,4,None,3,None,None,None,7,5,None,None,None,None,3,None,None,None,None,None]

# CLEANSE: replace None values with answer domain
for _i in range(dim4): # spin through entire Sudoku
    if not sud[_i]: # if cell is null [None]
        sud[_i] = dom # replace None with list of all possible values (i.e. 1, 2, 3, ... 9)

# ADDRESSES: the x, y, and z axies relative to a specified cell
def xyzl(_i):
    _xl, _yl, _zl = ([] for _j in range(dim)) # initialize lists
    _xyzs = set() # initialize set

    for _j in range(dim2): # nine cells in each x or y plane, or z zone
        _xl.append(_i - (_i % dim2) + _j) # relative x-plane addresses
        _yl.append((_i % dim2) + (_j * dim2)) # relative y-plane addresses
        _zl.append(((_i / dim2 / dim * dim) + (_i % dim2 / dim)) * dim2 - (_i % dim2 / dim * (dim * 2)) + _j + (_j / 3 * dim * 2)) # relative z-zone addresses

    _xyzs.update(_xl) # x addresses added to the xyz address set
    _xyzs.update(_yl) # y addresses added to the xyz address set
    _xyzs.update(_zl) # z addresses added to the xyz address set

    _xyzl = list(_xyzs) # xyz address set converted to xyz list

    return _xl, _yl, _zl, _xyzl

# SUN: solved, unsolved, and number of possibilities
def sun():
    s, u, n = (0 for _i in range(dim)) # initialize solved, unsolved, and possible values metrics

    for _i in range(dim4): # spin through entire Sudoku
        if type(sud[_i]) is int: # solved cell
            s += 1 # solved cell quantity
        else: # unsolved cell
            u += 1 # unsolved cell quantity
            n += len(sud[_i]) # number of possible values in unsolved cells

    return s, u, n

# PRUNE 1
# remove the cell's possibilities of solved integer values from the relative x/y/z planes/zone
def prune1():
    for _i in range(dim4): # spin through entire Sudoku
        if type(sud[_i]) is list: # cell data type is list
            _ints = set() # initialize solved integer set

            _xyzl = xyzl(_i)[3] # unique set of x/y/z addresses relative to i cell

            for _j in _xyzl: # each relative x, y, and z address for the cell
                if type(sud[_j]) is int: # solved cell value
                    _ints.add(sud[_j]) # added to the solved set of integers

            sud[_i] = list(set(sud[_i]).difference(_ints)) # remove solved integer values from possibilities list

    _sun = sun()
    print 'prune 1: ', _sun
    return _sun

# PRUNE 2
# remove possibilities from cells that can only exist on another, single x- or y-plane within a z-zone
def prune2():
    for _i in range(dim4):
        _xf, _yf, _zf = ([0] * (dim2 + 1) for _j in range(dim)) # frequency tables
        _x_l, _y_l, _xdl, _ydl = ([] for _j in range(dim + 1)) # addresses & possibility differences

        _xyzll = xyzl(_i) # relative x/y/z addresses
        _xl = _xyzll[0] # relative x addresses
        _yl = _xyzll[1] # relative y addresses
        _zl = _xyzll[2] # relative z addresses
        
        _xzl = list(set(_xl).intersection(_zl)) # x addresses within z zone
        _yzl = list(set(_yl).intersection(_zl)) # y addresses within z zone

        for _x in _xzl: # x addresses within z zone
            if type(sud[_x]) is list: # cell data type is list
                for _n in sud[_x]: # each possible value in the cell unsolved list
                    _xf[_n] += 1 # x frequency table

        for _y in _yzl: # y addresses within z zone
            if type(sud[_y]) is list: # cell data type is list
                for _n in sud[_y]: # each possible value in the cell unsolved list
                    _yf[_n] += 1 # y frequency table

        for _z in _zl: # z addresses
            if type(sud[_z]) is list: # cell data type is list
                for _n in sud[_z]: # each possible value in the cell unsolved list
                    _zf[_n] += 1 # z frequency table

        for _n, _f in enumerate(_zf): # z frequency table
            if _xf[_n] and _xf[_n] == _f: # if the x-plane frequency for the possible value equals the z-zone frequency for the same possible value
                _x_l = list(set(_xl).difference(_xzl)) # x addesses outside z zone
                _xdl.append(_n) # x difference list of possibilities

        for _x in _x_l: # x addresses outside of z zone
            if type(sud[_x]) is list: # cell data type is list
                sud[_x] = list(set(sud[_x]).difference(_xdl)) # cell possibilities minus x differences

        for _n, _f in enumerate(_zf): # z frequency table
            if _yf[_n] and _yf[_n] == _f: # if the y-plane frequency for the possible value equals the z-zone frequency for the same possible value
                _y_l = list(set(_yl).difference(_yzl)) # y addresses outside of z zone
                _ydl.append(_n) # y difference list of possibilities
    
        for _y in _y_l: # y addresses outside of z zone
            if type(sud[_y]) is list: # cell data type is list
                sud[_y] = list(set(sud[_y]).difference(_ydl)) # cell possibilities minus y differences

    _sun = sun()
    print 'prune 2: ', _sun
    return _sun

# PRUNE 3
# if an identical list of two values exists on the same x- or y-plane then their possible values cannot exist in any other list on the same plane
def prune3():
    for _i in range(dim4): # spin through entire Sudoku
        _xyzll = xyzl(_i)
        _xl = _xyzll[0]
        _yl = _xyzll[1]

        if type(sud[_i]) is list: # unsolved
            if len(sud[_i]) == 2: # two possible values
                for _x in _xl: # x addresses relative to i cell
                    if _x > _i: # only progress forward to not be redundant
                        if type(sud[_x]) is list: # cell data type is list
                            if set(sud[_i]) == set(sud[_x]): # an identical two-value set on the same plane
                                for _x2 in _xl: # each x-plane address
                                    if _x2 <> _x and _x2 <> _i and type(sud[_x2]) is list: # cell data type is list
                                        sud[_x2] = list(set(sud[_x2]).difference(sud[_x])) # remove possible values from unsolved cell set that cannot occur

        if type(sud[_i]) is list: # unsolved
            if len(sud[_i]) == 2: # two possible values
                for _y in _yl: # y addresses relative to i cell
                    if _y > _i: # only progress forward to not be redundant
                        if type(sud[_y]) is list: # cell data type is list
                            if set(sud[_i]) == set(sud[_y]): # an identical two-value set on the same plane
                                for _y2 in _yl: # each y-plane address
                                    if _y2 <> _y and _y2 <> _i and type(sud[_y2]) is list: # cell data type is list
                                        sud[_y2] = list(set(sud[_y2]).difference(sud[_y])) # remove possible values from unsolved cell set that cannot occur

    _sun = sun()
    print 'prune 3: ', _sun
    return _sun

# SOLVE 1:
# if a single possible list value remains, convert it to an integer (i.e. solved)
def solve1():
    for _i in range(dim4): # spin through entire Sudoku
        if type(sud[_i]) is list: # cell data type is a list
            if len(sud[_i]) == 1: # only one possible value left in list
                sud[_i] = sud[_i][0] # convert list to integer

    _sun = sun()
    print 'solve 1: ', _sun
    return _sun

# SOLVE 2:
# if a single possible value appears only once within an x- or y-plane or z-zone
def solve2():
    for _i in range(dim): # x, y, then z planes/zone
        for _j in range(dim4): # spin through entire Sudoku
            if type(sud[_j]) is list: # cell data type is list
                _fl = [0] * (dim2 + 1) # initialize a frequency table (list)

                _xyzll = xyzl(_j) # relative x, y, and z addresses for the cell

                for _xyz in _xyzll[_i]: # each x, y, or z address relative to the cell
                    if type(sud[_xyz]) is list: # relative x, y, or z cell data type is list
                        for _n in sud[_xyz]: # possible values in list
                            _fl[_n] += 1 # increment the frequency table
        
                for _n in sud[_j]: # each value in the cell's list
                    if _fl[_n] == 1: # if a value in the frequency table occurs only once
                        sud[_j] = _n # replace the cell's list with that integer that only occurs once

                        _sun = sun()
                        print 'solve 2: ', _sun
                        return _sun

    _sun = sun()
    print 'solve 2: ', _sun
    return _sun

# EXECUTE
# initial values: solved & unsolved cell count, and unsolved cell possible values
_sun = sun()
_n = _sun[2]
print '\n(solved, unsolved, possible #)'
print 'start 0: ', _sun

# loop until the sudoku is completely solved
while _n:
    # loop through pruning processes until the pruning of possible values has occurred or all pruning processes have been attempted
    _p1 = prune1()[2]
    if (_n - _p1):
        _n = _p1
    else:
        _p2 = prune2()[2]
        if (_n - _p2):
            _n = _p2
        else:
            _p3 = prune3()[2]
            if (_n - _p3):
                _n = _p3

    _s1 = solve1()[2]
    if (_n - _s1):
        _n = _s1
    else:
        _s2 = solve2()[2]
        if (_n - _s2):
            _n = _s2

# print SOLUTION
print '\n\n'
for _i in range(dim2): # spin through each column
    for _j in range(dim2): # spin through each row
        print sud[_i * dim2 + _j],
        if _j % 3 == 2 and _j < (dim2 - 1):
            print '|',
        else:
            print ' ',
    if _i % 3 == 2 and _i < (dim2 - 1):
        print '\n---------------------------------'
    else:
        print '\n'
print '\n'
				
			

Note: Indentation in Python matters. It isn't just formatting, it is syntactically necessary for the logic to be correct. For example, note the two print statements at the end. The first is part of the if block while the latter will be the last statement executed.

 

The process:

 

				
(solved, unsolved, possible #)
start 0:  (26, 55, 495)
prune 1:  (26, 55, 195)
solve 1:  (27, 54, 194)
prune 1:  (27, 54, 190)
solve 1:  (28, 53, 189)
prune 1:  (28, 53, 183)
solve 1:  (29, 52, 182)
prune 1:  (29, 52, 179)
solve 1:  (30, 51, 178)
prune 1:  (30, 51, 177)
solve 1:  (30, 51, 177)
solve 2:  (31, 50, 174)
prune 1:  (31, 50, 173)
solve 1:  (31, 50, 173)
solve 2:  (32, 49, 168)
prune 1:  (32, 49, 168)
prune 2:  (32, 49, 158)
solve 1:  (33, 48, 157)
prune 1:  (33, 48, 147)
solve 1:  (33, 48, 147)
solve 2:  (34, 47, 143)
prune 1:  (34, 47, 143)
prune 2:  (34, 47, 142)
solve 1:  (34, 47, 142)
solve 2:  (35, 46, 137)
prune 1:  (35, 46, 137)
prune 2:  (35, 46, 137)
prune 3:  (35, 46, 134)
solve 1:  (36, 45, 133)
prune 1:  (36, 45, 133)
prune 2:  (36, 45, 127)
solve 1:  (36, 45, 127)
solve 2:  (37, 44, 123)
prune 1:  (37, 44, 123)
prune 2:  (37, 44, 120)
solve 1:  (37, 44, 120)
solve 2:  (38, 43, 116)
prune 1:  (38, 43, 116)
prune 2:  (38, 43, 112)
solve 1:  (39, 42, 111)
prune 1:  (39, 42, 109)
solve 1:  (40, 41, 108)
prune 1:  (40, 41, 100)
solve 1:  (40, 41, 100)
solve 2:  (41, 40, 98)
prune 1:  (41, 40, 98)
prune 2:  (41, 40, 94)
solve 1:  (42, 39, 93)
prune 1:  (42, 39, 93)
prune 2:  (42, 39, 93)
prune 3:  (42, 39, 92)
solve 1:  (43, 38, 91)
prune 1:  (43, 38, 91)
prune 2:  (43, 38, 91)
prune 3:  (43, 38, 91)
solve 1:  (43, 38, 91)
solve 2:  (44, 37, 88)
prune 1:  (44, 37, 88)
prune 2:  (44, 37, 87)
solve 1:  (44, 37, 87)
solve 2:  (45, 36, 85)
prune 1:  (45, 36, 84)
solve 1:  (46, 35, 83)
prune 1:  (46, 35, 81)
solve 1:  (47, 34, 80)
prune 1:  (47, 34, 76)
solve 1:  (48, 33, 75)
prune 1:  (48, 33, 73)
solve 1:  (49, 32, 72)
prune 1:  (49, 32, 71)
solve 1:  (50, 31, 70)
prune 1:  (50, 31, 68)
solve 1:  (52, 29, 66)
prune 1:  (52, 29, 60)
solve 1:  (55, 26, 57)
prune 1:  (55, 26, 49)
solve 1:  (62, 19, 42)
prune 1:  (62, 19, 34)
solve 1:  (70, 11, 26)
prune 1:  (70, 11, 16)
solve 1:  (76, 5, 10)
prune 1:  (76, 5, 5)
solve 1:  (81, 0, 0)
				
			

 

The solution:

 

				
9   3   5 | 1   7   6 | 4   8   2  
 
4   2   8 | 3   9   5 | 6   7   1  
 
7   1   6 | 8   4   2 | 3   5   9  
---------------------------------
1   7   4 | 2   8   3 | 5   9   6  
 
5   6   3 | 9   1   4 | 8   2   7  
 
8   9   2 | 5   6   7 | 1   3   4  
---------------------------------
2   4   7 | 6   3   8 | 9   1   5  
 
6   8   9 | 7   5   1 | 2   4   3  
 
3   5   1 | 4   2   9 | 7   6   8  
				
			

 

As you can see by the log, the Sudoku started off with 26 cells having values, leaving 55 to be solved. Those 55 cells were populated with lists of values 1 through 9. The first pruning pass removed all values from those lists which were solved in the same x- or y-axis, or zone cells. Here the result was the elimination of 300 possible values from 495 to 195. Now the fun begins.

 

The first solve step finds one unsolved cell with one value left in its list. Now answered the solved total increases to 27 and unsolved decreases to 54, while possible remaining values decreases by 1 to 194.

 

Now that we have solved a cell it's time to prune the remaining possible values again. This time removing the value we just solved for from all unsolved cells' lists in the same x-axis, y-axis, and zone. Four values removed. Now 190 possibilities remain.

 

From this point there are times when we fail to prune or solve cells with the easy first level of routines and need to kick it up to the next level. I am often asked, why have level 1 if level 2 is more sophisticated? The simple answer is because they are different. Level 2 does not perform the same routine as level 1. Level 1 performs some simple sweeping while level 2 is more of the vacuum cleaner. And level 3? More like the power washer.

 

In this scenario, note the two instances when the prune 3 function is actually used. Twice. Not often. But without it the puzzle could not be solved. The second prune 3 process only removes one possible value. One! From 93 to 92. But that's all it took to break the puzzle open.

 

You may notice there is a third prune 3 process. It produced no reduction at all. However, in spinning through the subsequent solve routines solve 2 gained further progress. The reason for this was logically when the previous solve 1 nets a result the process shifts back to pruning--an upfront decision I had made in the program's design.

 

Finally, another question I get asked is whether or not the program performs any guessing. No. No guesses are made. It is solely set theory executed in procedural looped stages until complete.

 

I hope this offers a means to learn some of the syntax and capabilities of Python for those unfamiliar with it.

 

Cheers
-Brad