bag.text package

Module contents

Functions to manipulate strings.

bag.text.break_lines_near(text, length, leeway=4, whitespace=' \\r\\n\\t', end_line_break='…', start_line_break='…')[source]

Return a list of text broken in lines of max length.

  • leeway: how far to search for whitespace

  • whitespace: characters considered whitespace

  • end_line_break: character to add to the end of broken words

  • start_line_break: character to add to the start of broken words

Return type

List[str]

bag.text.capitalize(txt)[source]

Trim, then turn only the first character into upper case.

This function can be used as a colander preparer.

Return type

str

bag.text.content_of(paths, encoding='utf-8', sep='\n')[source]

Read, join and return the contents of paths.

Makes it easy to read one or many files.

bag.text.find_new_title(dir, filename)[source]

Return a path that does not exist yet, in dir.

If filename exists in dir, adds or changes the end of the file title until a name is found that doesn’t yet exist.

For instance, if file “Image (01).jpg” exists in “somedir”, returns “somedir/Image (02).jpg”.

Return type

str

bag.text.keep_digits(txt)[source]

Discard from txt all non-numeric characters.

Return type

str

bag.text.parse_iso_date(txt)[source]

Parse a datetime in ISO format.

Return type

datetime

bag.text.pluralize(singular)[source]

Return plural form of given lowercase singular word (English only).

Based on ActiveState recipe http://code.activestate.com/recipes/413172/

>>> pluralize('')
''
>>> pluralize('goose')
'geese'
>>> pluralize('dolly')
'dollies'
>>> pluralize('genius')
'genii'
>>> pluralize('jones')
'joneses'
>>> pluralize('pass')
'passes'
>>> pluralize('zero')
'zeros'
>>> pluralize('casino')
'casinos'
>>> pluralize('hero')
'heroes'
>>> pluralize('church')
'churches'
>>> pluralize('x')
'xs'
>>> pluralize('car')
'cars'
Return type

str

bag.text.random_string(length, chars='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789')[source]

Return a random string of some length.

Return type

str

bag.text.resist_bad_encoding(txt, possible_encodings=('utf8', 'iso-8859-1'))[source]

Use this to try to avoid errors from text whose encoding is unknown, when erroring out would be worse than possibly displaying garbage.

Maybe we should use the chardet library instead…

bag.text.shorten(txt, length=10, ellipsis='…')[source]

Truncate txt, adding ellipsis to end, with total length.

Return type

str

bag.text.shorten_proper(name, length=11, ellipsis='…', min=None)[source]

Shorten a proper name for displaying.

Return type

str

bag.text.simplify_chars(txt, encoding='ascii', byts=False, amap=None)[source]

Remove from txt all characters not supported by encoding

but using a map to “simplify” some characters instead of just removing them.

If byts is true, return a bytestring.

bag.text.slugify(txt, exists=<function <lambda>>, badchars='', maxlength=16, chars='abcdefghijklmnopqrstuvwxyz23456789', min_suffix_length=1, max_suffix_length=4)[source]

Return a slug that does not yet exist, based on txt.

You may provide exists, a callback that takes a generated slug and checks the database to see if it already exists.

Each attempt generates a longer suffix in order to keep the number of attempts at a minimum.

Return type

str

bag.text.strip_lower_preparer(value)[source]

Colander preparer that trims whitespace and converts to lowercase.

bag.text.strip_preparer(value)[source]

Colander preparer that trims whitespace around argument value.

bag.text.to_filename(txt, for_web=False, badchars='', maxlength=0, encoding='latin1')[source]

Massage txt until it is a good filename.

Return type

str

bag.text.uncommafy(txt, sep=',')[source]

Generate the elements of a comma-separated string.

Takes a comma-delimited string and returns a generator of stripped strings. No empty string is yielded.

Return type

Generator[str, None, None]