Tags

,

Recently I found myself dividing my attention between watching the Minnesota Twins lose yet another ballgame and goofing around with an idea that popped into my head for no reason I can name.

The idea turned out better than imagined in terms of capability given its design simplicity, so I thought I’d document it here. It’s not super useful as is, but those relatively new to Python might find it educational or otherwise helpful.

The basic idea — not original with me — is to break text down into sections, paragraphs, sentences, and words, where each of those are objects. So, a section is an object containing an indexable list of paragraphs; each paragraph is an object containing a list of sentences; each sentence is an object containing a list of words; and each word is an object containing a list of characters.

Note that for the list of characters in word objects, we’ll just use strings — which are list-like objects — but we could implement words as lists of individual chars if we wanted our design model to be 100% consistent. Doing that makes the code a bit larger and slower but might offer some advantages in editing or Unicode handling.

We’ll look at the code in several iterations. The first iteration is the original version I “doodled” while watching the Twins lose. The later iterations progressively improve on the design. The end result, perhaps of dubious use, is at least kinda fun.


The first iteration uses typename, a lambda function defined in my goodies library file. Code in this post can’t assume said library, so we need to recreate typename:

001| typename = lambda obj: type(obj).__name__
002| 

The function takes an object and returns the name of its class:

001| from examples import typename
002| 
003| if __name__ == ‘__main__’:
004|     a = 42
005|     b = ’42’
006|     c = (21, 42, 63, 84)
007|     d = [86, 99]
008|     e = object()
009| 
010|     print(f’a: {typename(a)}’)
011|     print(f’b: {typename(b)}’)
012|     print(f’c: {typename(c)}’)
013|     print(f’d: {typename(d)}’)
014|     print(f’e: {typename(e)}’)
015|     print()
016| 

[If you’re new to “f-strings”, see: Simple Python Tricks #8.]

When run, this prints:

a: int
b: str
c: tuple
d: list
e: object

With that, we’re ready to define a Word object:

001| from examples import typename
002| 
003| class Word:
004|     ”’Implements a Word as a list of characters (a string).”’
005| 
006|     def __init__ (self, text):
007|         ”’Initialize new Word instance.”’
008|         self.chars = text
009| 
010|     def __bool__ (self):
011|         ”’A Word is True if it has non-zero length.”’
012|         return (0 < len(self.chars))
013| 
014|     def __len__ (self):
015|         ”’Return Word length.”’
016|         return len(self.chars)
017| 
018|     def __getitem__ (self, idx):
019|         ”’Return a character by index.”’
020|         return self.chars[idx]
021| 
022|     def __iter__ (self):
023|         ”’Return an interater over the Word characters.”’
024|         return iter(self.chars)
025| 
026|     def __repr__ (self):
027|         ”’Return a REPR string.”’
028|         return f'<{typename(self)} @{id(self):08x}>’
029| 
030|     def __str__ (self):
031|         ”’Word string is just its characters.”’
032|         return self.chars
033| 
034| 
035| if __name__ == ‘__main__’:
036| 
037|     # New Word instance…
038|     wrd = Word(“Hello”)
039| 
040|     print(f’String: {wrd!s}’)
041|     print(f’REPR: {wrd!r}’)
042|     print(f’bool: {bool(wrd)}’)
043|     print(f’len: {len(wrd)}’)
044|     print(f'[2]: {wrd[2]}’)
045|     print()
046|     for ix,char in enumerate(wrd):
047|         print(f'{ix}: {char}’)
048|     print()
049| 

The Word class delegates character storage to an internal string (self.chars), so the implementations of Boolean value (lines #10 to #12), length (lines #14 to #16), indexing (lines #18 to #20), iteration (lines #22 to #24), and string version (lines #30 to #32), all delegate to the internal string.

The code in __repr__ (lines #26 to #28) is why we need the typename function. We’ll use the same code in all classes.

When run, this prints:

String: Hello
REPR:   <Word @14c8babf220>
bool:   True
len:    5
[2]:    l

0: H
1: e
2: l
3: l
4: o

So far, so good. Now let’s implement a Sentence:

001| from examples import typename, Word
002| 
003| class Sentence:
004|     ”’Implements a Sentence as a list of Words.”’
005| 
006|     def __init__ (self, text):
007|         ”’Initialize new Sentence instance.”’
008|         self.words = [Word(w) for w in text.split()]
009| 
010|     def __bool__ (self):
011|         ”’A Sentence is True if it has non-zero length.”’
012|         return (0 < len(self.words))
013| 
014|     def __len__ (self):
015|         ”’Return Sentence length.”’
016|         return len(self.words)
017| 
018|     def __getitem__ (self, idx):
019|         ”’Return a character by index.”’
020|         return self.words[idx]
021| 
022|     def __iter__ (self):
023|         ”’Return an interater over the Sentence words.”’
024|         return iter(self.words)
025| 
026|     def __repr__ (self):
027|         ”’Return a REPR string.”’
028|         return f'<{typename(self)} @{id(self):08x}>’
029| 
030|     def __str__ (self):
031|         ”’Sentence string is its words joined by spaces.”’
032|         text = ‘ ‘.join(str(w) for w in self.words)
033|         return text
034| 
035| 
036| if __name__ == ‘__main__’:
037| 
038|     # New Sentence instance…
039|     snt = Sentence(“Now is the time for all good people to party.”)
040| 
041|     print(f’String: {snt!s}’)
042|     print(f’REPR: {snt!r}’)
043|     print(f’bool: {bool(snt)}’)
044|     print(f’len: {len(snt)}’)
045|     print(f'[2]: {snt[2]}’)
046|     print()
047|     for ix,wrd in enumerate(snt):
048|         print(f'{ix}: {wrd}’)
049|     print()
050| 

With the exception that in the __init__ method (lines #6 to #8) we split the text into space-separated words to generate a list of Word objects. (Note that using the str.split method with its default compresses multiple spaces into single delimiters.)

Beyond that, the Sentence class is nearly identical to the Word class. When run, this prints:

String: Now is the time for all good people to party.
REPR:   <Sentence @14bf2b6eb90>
bool:   True
len:    10
[2]:    the

0: Now
1: is
2: the
3: time
4: for
5: all
6: good
7: people
8: to
9: party.

Moving on, now let’s implement the Paragraph class:

001| from examples import typename, Sentence
002| 
003| class Paragraph:
004|     ”’Implements a Paragraph as a list of Sentences.”’
005| 
006|     def __init__ (self, text):
007|         ”’Initialize new Paragraph instance.”’
008|         self.lines = [Sentence(s) for s in text.split(‘.’) if 0 < len(s)]
009| 
010|     def __bool__ (self):
011|         ”’A Paragraph is True if it has non-zero length.”’
012|         return (0 < len(self.lines))
013| 
014|     def __len__ (self):
015|         ”’Return Paragraph length.”’
016|         return len(self.lines)
017| 
018|     def __getitem__ (self, idx):
019|         ”’Return a character by index.”’
020|         return f'{self.lines[idx]}.’
021| 
022|     def __iter__ (self):
023|         ”’Return an interater over the Paragraph sentences.”’
024|         return iter(self.lines)
025| 
026|     def __repr__ (self):
027|         ”’Return a REPR string.”’
028|         return f'<{typename(self)} @{id(self):08x}>’
029| 
030|     def __str__ (self):
031|         ”’Paragraph string is its sentences joined by spaces.”’
032|         return ‘ ‘.join(f'{s}.’ for s in self.lines)
033| 
034| 
035| if __name__ == ‘__main__’:
036| 
037|     # New Paragraph instance…
038|     par = Paragraph(“Hello, World! How are you. I am fine. Good-bye now.”)
039| 
040|     print(f’String: {par!s}’)
041|     print(f’REPR: {par!r}’)
042|     print(f’bool: {bool(par)}’)
043|     print(f’len: {len(par)}’)
044|     print(f'[2]: {par[2]}’)
045|     print()
046|     for ix,snt in enumerate(par):
047|         print(f'{ix}: {par[ix]}’)
048|         print(f'{ix}: {snt}’)
049|     print()
050| 

No surprises, this is again nearly identical to the previous classes. But a crack has begun to show in its functionality. When run, this prints:

String: Hello, World! How are you. I am fine. Good-bye now.
REPR:   <Paragraph @2823a305780>
bool:   True
len:    3
[2]:    Good-bye now.

0: Hello, World! How are you.
0: Hello, World! How are you
1: I am fine.
1: I am fine
2: Good-bye now.
2: Good-bye now

Which seems to work okay, but iterating over the sentences with an iterator (rather than by indexing) makes it clear we’re missing something.

Firstly, the code works if every sentence ends with a period — because we’re splitting the text on periods — but if a sentence ends with something else (a question mark or exclamation mark), that sentence gets merged with the next one.

Secondly, the Sentence objects don’t know how they’re supposed to end — periods are assumed. And this requires some gyrations when emitting sentences — we have to tack the periods back on (lines #20 and #32). This is all a bit … ugly.

Ideally, a Sentence object should know how it ends. We’ll fix this below, but for now, let’s continue with the design pattern.

The final piece is the Section class:

001| from examples import typename, Paragraph
002| 
003| class Section:
004|     ”’Implements a Section as a list of Paragraphs.”’
005| 
006|     def __init__ (self, text):
007|         ”’Initialize new Section instance.”’
008|         self.paras = [Paragraph(p) for p in text.split(‘\n\n’)]
009| 
010|     def __bool__ (self):
011|         ”’A Section is True if it has non-zero length.”’
012|         return (0 < len(self.paras))
013| 
014|     def __len__ (self):
015|         ”’Return Section length.”’
016|         return len(self.paras)
017| 
018|     def __getitem__ (self, idx):
019|         ”’Return a character by index.”’
020|         return self.paras[idx]
021| 
022|     def __iter__ (self):
023|         ”’Return an interater over the Section paragraphs.”’
024|         return iter(self.paras)
025| 
026|     def __repr__ (self):
027|         ”’Return a REPR string.”’
028|         return f'<{typename(self)} @{id(self):08x}>’
029| 
030|     def __str__ (self):
031|         ”’Section string is its paragraphs joined by blank lines.”’
032|         return ‘\n\n’.join(str(p) forin self.paras)
033| 
034| 
035| if __name__ == ‘__main__’:
036|     text = “””\
036| Hello. Just a brief paragraph. Good-bye.
036| 
036| Hello, again. I’m back.
036| 
036| But now I’m leaving. See you later.”””

037|     # New Section instance…
038|     sec = Section(text)
039| 
040|     print(f’String: {sec!s}’)
041|     print(f’REPR: {sec!r}’)
042|     print(f’bool: {bool(sec)}’)
043|     print(f’len: {len(sec)}’)
044|     print(f'[2]: {sec[2]}’)
045|     print()
046|     for ix,par in enumerate(sec):
047|         print(f'{ix}: {par}’)
048|     print()
049| 

Same pattern, and when run, this prints:

String: Hello. Just a brief paragraph. Good-bye.
Hello, again. I'm back.
But now I'm leaving. See you later.
REPR:   <Section @13c474ccdc0>
bool:   True
len:    3
[2]:    But now I'm leaving. See you later.

0: Hello. Just a brief paragraph. Good-bye.
1: Hello, again. I'm back.
2: But now I'm leaving. See you later.

Which seems basically okay. Though we know it has flaws, it works for very basic text — specifically, text with sentences that always end with periods.

We can exercise the code with a bit more text:

001| from examples import Section
002| 
003| DemoText = “””\
003| Some of our group was already in Boston when the six of us arrived.
003| Others arrived slightly later or the next day. Those who were there
003| met in the Lobby to go hunt dinner. The convention is held annually,
003| so many of our group knew Boston and had ideas about interesting
003| places to go. This, to me, is better than eating at the hotel or
003| trying to find someplace nearby. When I travel, I hate eating or
003| doing anything I could do at home, so I was delighted to explore.
003| 
003| Lunch each of the four days was served in a large room. I have to
003| give Marriott low marks for its conference lunch service. That first
003| day, they served a shrimp/scallops-in-cream-sauce thing that, while
003| I liked it, is problematic. Some people don’t care for seafood or
003| suffer allergic reactions to it. Not a great idea for a mass lunch,
003| I think. (Although it is Boston.) One member of our group doesn’t eat
003| things that “live in shells” (interesting rule), and had to wait
003| nearly to the end of lunch to get her alternate.”””

004| 
005| def main ():
006|     print(DemoText)
007|     print()
008| 
009|     txtobj = Section(DemoText)
010|     print(txtobj)
011|     print()
012| 
013|     for px,para in enumerate(txtobj):
014|         print(f’Paragraph {px}’)
015| 
016|         for sx,sent in enumerate(para):
017|             print(f’Sentence {sx}’)
018| 
019|             for wx,word in enumerate(sent):
020|                 print(f'{wx:2d}: {word}’)
021| 
022|             print()
023|         print()
024|     print()
025| 
026|     # Return the text object…
027|     return txtobj
028| 
029| 
030| if __name__ == ‘__main__’:
031|     print()
032|     txt = main()
033|     print()
034| 

When run, this prints … well, quite a lot and more than I want to show here.

One key point is that the string version of txtobj (the Section instance) prints the section reasonably accurately. Note how it cleans up any double spaces and ignores single newlines embedded in the text:

Some of our group was already in Boston when the six of us arrived. Others arrived slightly later or the next day. Those who were there met in the Lobby to go hunt dinner. The convention is held annually, so many of our group knew Boston and had ideas about interesting places to go.​​This, to me, is better than eating at the hotel or trying to find someplace nearby. When I travel, I hate eating or doing anything I could do at home, so I was delighted to explore.

Lunch each of the four days was served in a large room. I have to give Marriott low marks for its conference lunch service. That first day, they served a shrimp/scallops-in-cream-sauce thing that, while I liked it, is problematic. Some people don’t care for seafood or suffer allergic reactions to it. Not a great idea for a mass lunch, I think. (Although it is Boston. ) One member of our group doesn’t eat things that “live in shells” (interesting rule), and had to wait nearly to the end of lunch to get her alternate.

Not bad, but there’s a trailing space in “(Although it is Boston. )” that shouldn’t be there. A close look at the output from the iteration over the paragraphs, sentences, and words shows that we’re not really parsing this as well as we’d like.


Let’s see if we can improve things.

An obvious observation is that the __repr__ method is the same in each class. This immediately suggests something like this:

001| class TextObject:
002|     ”’Basic Text Object. Implements repr.”’
003| 
004|     def __repr__ (self):
005|         return f'<{type(self).__name__} @{id(self):08x}>’
006| 
007| 
008| class Word (TextObject):
009| 
010|     def __init__ (self, text):
011|         ”’Initialize new Word instance.”’
012|         self.chars = text
013| 
014|     def __bool__ (self):
015|         ”’A Word is True if it has non-zero length.”’
016|         return (0 < len(self.chars))
017| 
018|     def __len__ (self):
019|         ”’Return Word length.”’
020|         return len(self.chars)
021| 
022|     def __getitem__ (self, idx):
023|         ”’Return a character by index.”’
024|         return self.chars[idx]
025| 
026|     def __iter__ (self):
027|         ”’Return an interater over the Word characters.”’
028|         return iter(self.chars)
029| 
030|     def __str__ (self):
031|         ”’Word string is just its characters.”’
032|         return self.chars
033| 

And likewise for the other classes. The new TextObject (abstract) base class implements the common __repr__ method. Note that we no longer need the typename function since we’re implementing __repr__ only once and can implement what we need there.

But hold on, the classes have a lot more in common. Perhaps we can have TextObject implement even more. Like this:

001| class TextObject:
002| 
003|     def __init__ (self):
004|         ”’Initialize new TextObject instance.”’
005|         self.content = None
006| 
007|     def __bool__ (self):
008|         ”’True if non-zero length.”’
009|         return (0 < len(self.content))
010| 
011|     def __len__ (self):
012|         ”’Return length.”’
013|         return len(self.content)
014| 
015|     def __getitem__ (self, idx):
016|         ”’Return a content item by index.”’
017|         return self.content[idx]
018| 
019|     def __iter__ (self):
020|         ”’Return an interater over the content.”’
021|         return iter(self.content)
022| 
023|     def __repr__ (self):
024|         ”’Generic REPR string.”’
025|         return f'<{type(self).__name__} @{id(self):08x}>’
026| 
027| 
028| class Word2 (TextObject):
029|     ”’Implements a Word as a list of characters (a string).”’
030| 
031|     def __init__ (self, text):
032|         ”’Initialize new Word instance.”’
033|         self.content = list(text)
034| 
035|     def __str__ (self):
036|         ”’Word string is just its characters.”’
037|         return .join(self.content)
038| 
039| 
040| class Sentence2 (TextObject):
041|     ”’Implements a Sentence as a list of Words.”’
042| 
043|     def __init__ (self, text):
044|         ”’Initialize new Word instance.”’
045|         self.content = [Word2(w) for w in text.split()]
046| 
047|     def __str__ (self):
048|         ”’Sentence string is its words joined by spaces.”’
049|         return ‘ ‘.join(str(w) for w in self.content)
050| 
051| 
052| class Paragraph2 (TextObject):
053|     ”’Implements a Paragraph as a list of Sentences.”’
054| 
055|     def __init__ (self, text):
056|         ”’Initialize new Word instance.”’
057|         self.content = [Sentence2(s) for s in text.split(‘.’) if 0 < len(s)]
058| 
059|     def __getitem__ (self, idx):
060|         ”’Return a character by index.”’
061|         return f'{self.lines[idx]}.’
062| 
063|     def __str__ (self):
064|         ”’Paragraph string is its sentences joined by spaces.”’
065|         return ‘ ‘.join(f'{s}.’ for s in self.content)
066| 
067| 
068| class Section2 (TextObject):
069|     ”’Implements a Section as a list of Paragraphs.”’
070| 
071|     def __init__ (self, text):
072|         ”’Initialize new Word instance.”’
073|         self.content = [Paragraph2(p) for p in text.split(‘\n\n’)]
074| 
075|     def __str__ (self):
076|         ”’Section string is its paragraphs joined by newlines.”’
077|         return ‘\n\n’.join(str(p) for p in self.content)
078| 
079| 
080| DemoText = “””\
080| Some of our group was already in Boston when the six of us arrived.
080| Others arrived slightly later or the next day. Those who were there
080| met in the Lobby to go hunt dinner. The convention is held annually,
080| so many of our group knew Boston and had ideas about interesting
080| places to go. This, to me, is better than eating at the hotel or
080| trying to find someplace nearby. When I travel, I hate eating or
080| doing anything I could do at home, so I was delighted to explore.
080| 
080| Lunch each of the four days was served in a large room. I have to
080| give Marriott low marks for its conference lunch service. That first
080| day, they served a shrimp/scallops-in-cream-sauce thing that, while
080| I liked it, is problematic. Some people don’t care for seafood or
080| suffer allergic reactions to it. Not a great idea for a mass lunch,
080| I think. (Although it is Boston.) One member of our group doesn’t eat
080| things that “live in shells” (interesting rule), and had to wait
080| nearly to the end of lunch to get her alternate.”””

081| 
082| if __name__ == ‘__main__’:
083|     print()
084| 
085|     txtobj = Section2(DemoText)
086|     print(txtobj)
087|     print()
088| 

The key is using the self.content attribute in all classes. This lets us also implement common Boolean value (lines #7 to #9), length (lines #11 to #13), indexing (lines #15 to #17), and iteration (lines #19 to #21) methods.

It also makes our classes themselves much smaller because we only need to implement the __init__ and __str__ methods for each. And the __getitem__ method for Paragraph (lines #59 to #61) because we need to tack the periods back onto the sentences.


Which brings us back to the issue regarding assuming all sentences end with periods. They clearly do not, so let’s fix that now. Here’s an improved version of the Paragraph class:

001| from examples import TextObject, Sentence2
002| 
003| class Paragraph3 (TextObject):
004|     ”’Implements a Paragraph as a list of Sentences.”’
005| 
006|     def __init__ (self, text):
007|         ”’Initialize new Paragraph instance. (Improved version.)”’
008|         self.content = []
009| 
010|         buf = []
011|         for char in text:
012|             if char in [‘.’, ‘!’, ‘?’]:
013|                 buf.append(char)
014|                 self.content.append(Sentence2(.join(buf).strip()))
015|                 buf = []
016|             else:
017|                 buf.append(char)
018| 
019|         if 0 < len(buf):
020|             self.content.append(Sentence2(.join(buf).strip()))
021| 
022|     def __str__ (self):
023|         ”’Paragraph string is its sentences joined by spaces.”’
024|         text = ‘ ‘.join(str(s) for s in self.content)
025|         # Special handling for parenthesis…
026|         text = text.replace(‘( ‘, ‘(‘)
027|         text = text.replace(‘ )’, ‘)’)
028|         return text
029| 
030| 
031| if __name__ == ‘__main__’:
032| 
033|     # New Paragraph instance…
034|     par = Paragraph3(“Hello, World! How are you. I am fine. Good-bye now.”)
035| 
036|     print(f’String: {par!s}’)
037|     print(f’REPR: {par!r}’)
038|     print(f’bool: {bool(par)}’)
039|     print(f’len: {len(par)}’)
040|     print(f'[2]: {par[2]}’)
041|     print()
042|     for ix,snt in enumerate(par):
043|         print(f'{ix}: {par[ix]}’)
044|         print(f'{ix}: {snt}’)
045|     print()
046| 

We also add some code to get rid of any spaces just inside parenthesis (lines #25 to #27). When run, now we get:

String: Hello, World! How are you. I am fine. Good-bye now.
REPR:   <Paragraph3 @2823a305780>
bool:   True
len:    4
[2]:    I am fine.

0: Hello, World!
0: Hello, World!
1: How are you.
1: How are you.
2: I am fine.
2: I am fine.
3: Good-bye now.
3: Good-bye now.

Giving us consistent results between indexing and iterating, plus sentences now preserve their ending punctuation. Not a bad win, and the code still has a delightfully small size (in fact, smaller than it started).

If we reimplement Section2 (as Section3) to use Paragraph3, then we can repeat our exercise:

001| from examples import Section3
002| 
003| DemoText = “””\
003| Some of our group was already in Boston when the six of us arrived.
003| Others arrived slightly later or the next day. Those who were there
003| met in the Lobby to go hunt dinner. The convention is held annually,
003| so many of our group knew Boston and had ideas about interesting
003| places to go. This, to me, is better than eating at the hotel or
003| trying to find someplace nearby. When I travel, I hate eating or
003| doing anything I could do at home, so I was delighted to explore.
003| 
003| Lunch each of the four days was served in a large room. I have to
003| give Marriott low marks for its conference lunch service. That first
003| day, they served a shrimp/scallops-in-cream-sauce thing that, while
003| I liked it, is problematic. Some people don’t care for seafood or
003| suffer allergic reactions to it. Not a great idea for a mass lunch,
003| I think. (Although it is Boston.) One member of our group doesn’t eat
003| things that “live in shells” (interesting rule), and had to wait
003| nearly to the end of lunch to get her alternate.”””

004| 
005| if __name__ == ‘__main__’:
006| 
007|     txtobj = Section3(DemoText)
008|     print(txtobj)
009|     print()
010| 
011|     sent = txtobj[0][2]
012|     word = txtobj[0][2][4]
013|     char = txtobj[0][2][4][1]
014| 
015|     print(f'{sent=!s}’)
016|     print(f'{word=!s}’)
017|     print(f'{char=!s}’)
018|     print()
019| 
020|     # Unpack text objects into known # of objects…
021|     para0,para1,*_ = txtobj
022|     sent0,sent1,*_ = para1
023|     word0,word1,*_ = sent1
024|     char0,char1,*_ = word1
025|     print(f'{sent1=!s}’)
026|     print(f'{word1=!s}’)
027|     print(f'{char1=!s}’)
028|     print()
029| 

Along with testing some indexing and unpacking directly into variables.

And now we have a Section objects made of Paragraph objects made of Sentence objects made of Word objects made of characters (which in Python are objects). As the example above illustrates, it allows indexing of any of these objects.


A final improvement we might make is to recognize that, since we’re delegating to a list object in each class here, why not just subclass list?

Indeed, why not:

001| class TextObject4 (list):
002| 
003|     def __repr__ (self):
004|         ”’Generic REPR string.”’
005|         return f'<{type(self).__name__} @{id(self):08x}>’
006| 
007| class Word4 (TextObject4):
008|     ”’Implements a Word as a list of characters (a string).”’
009| 
010|     def __init__ (self, text):
011|         ”’Initialize new Word instance.”’
012|         super().__init__(text)
013| 
014|     def __str__ (self):
015|         ”’Word string is just its characters.”’
016|         return .join(self)
017| 
018| class Sentence4 (TextObject4):
019|     ”’Implements a Sentence as a list of Words.”’
020| 
021|     def __init__ (self, text):
022|         ”’Initialize new Word instance.”’
023|         super().__init__(Word4(w) for w in text.split())
024| 
025|     def __str__ (self):
026|         ”’Sentence string is its words joined by spaces.”’
027|         return ‘ ‘.join(str(w) for w in self)
028| 
029| class Paragraph4 (TextObject4):
030|     ”’Implements a Paragraph as a list of Sentences.”’
031| 
032|     def __init__ (self, text):
033|         ”’Initialize new Paragraph instance.”’
034|         buf = []
035|         tmp = []
036|         for char in text:
037|             if char in [‘.’, ‘!’, ‘?’]:
038|                 tmp.append(char)
039|                 buf.append(Sentence4(.join(tmp).strip()))
040|                 tmp = []
041|             else:
042|                 tmp.append(char)
043| 
044|         if 0 < len(tmp):
045|             buf.append(Sentence4(.join(tmp).strip()))
046|         super().__init__(buf)
047| 
048|     def __str__ (self):
049|         ”’Paragraph string is its sentences joined by spaces.”’
050|         text = ‘ ‘.join(str(s) for s in self)
051|         # Special handling for parenthesis…
052|         text = text.replace(‘( ‘, ‘(‘)
053|         text = text.replace(‘ )’, ‘)’)
054|         return text
055| 
056| class Section4 (TextObject4):
057|     ”’Implements a Section as a list of Paragraphs.”’
058| 
059|     def __init__ (self, text):
060|         ”’Initialize new Word instance.”’
061|         super().__init__(Paragraph4(p) for p in text.split(‘\n\n’))
062| 
063|     def __str__ (self):
064|         ”’Section string is its paragraphs joined by newlines.”’
065|         return ‘\n\n’.join(str(p) for p in self)
066| 

Now the classes are really short. They all inherit from TextObject4, which inherits from list. Now each object is a list and inherits all list methods. And no more delegating, so quite a win.

The ability to do things like this — to first create a series of related classes, then extract common functionality into a base class — is a key reason I’m sold on object-oriented design. That we can also leverage the functionality of well-designed existing classes is another reason.

Hope you found this useful. Or at least fun.


Link: Zip file containing all code fragments used in this post.