bag.email_validator module

The ultimate functions for domain validation and e-mail address validation.

Why not just use a regular expression?

http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx

There are many regular expressions out there for this. The “perfect one” is several KB long and therefore unmaintainable (Perl people wrote it…).

This is 2009 and domain rules are changing too. Impossible domain names have become possible, international domain names are real…

So validating an e-mail address is more complex than you might think. Take a look at some of the rules: http://en.wikipedia.org/wiki/E-mail_address#RFC_specification

How to do it then?

I believe the solution should combine simple regular expressions with imperative programming.

E-mail validation is also dependent on the robustness principle: “Be conservative in what you do, be liberal in what you accept from others.” http://en.wikipedia.org/wiki/Postel%27s_law

This module recognizes that e-mail validation can be done in several different ways, according to purpose:

1) Most of the time you just want validation according to the standard rules. So just say: v = EmailValidator()

2) If you are creating e-mail addresses for your server or your organization, you might need to satisfy a stricter policy such as “dash is not allowed in email addresses”. The EmailValidator constructor accepts a local_part_chars argument to help build the right regular expression for you. Example: v = EmailValidator(local_part_chars='.-+_')

3) What about typos? An erroneous dot at the end of a typed email is typical. Other common errors with the dots revolve around the @: user@.domain.com. These typing mistakes can be automatically corrected, saving you from doing it manually. For this you use the fix flag when instantiating a validator:

d = DomainValidator(fix=True)
domain, error_message = d.validate('.supercalifragilistic.com.br')
if error_message:
    print('Invalid domain:', domain)
else:
    print('Valid domain:', domain)

4) TODO: Squash the bugs in this feature! Paranoid people may wish to verify that the informed domain actually exists. For that you can pass a lookup_dns='a' argument to the constructor, or even lookup_dns=’mx’ to verify that the domain actually has e-mail servers. To use this feature, you need to install the pydns library:

pip install pydns

How to use

The validating methods return a tuple (email, error_msg). email is the trimmed and perhaps fixed email. error_msg is an empty string when the e-mail is valid.

Typical usage is:

v = EmailValidator()  # or EmailValidator(fix=True)
email = raw_input('Type an email: ')
email, err = v.validate(email)
if err:
    print('Error:', err)
else:
    print('E-mail is valid:', email)  # the email, corrected

There is also an EmailHarvester class to collect e-mail addresses from any text.

See also tests/test_email_validator.py

class bag.email_validator.BaseValidator[source]

Bases: object

validate_or_raise(*a, **k)[source]

Raise ValidationException if validation fails.

Some people would condemn this whole module screaming: “Don’t return success codes, use exceptions!” This method allows them to be happy, too.

class bag.email_validator.DomainValidator(fix=False, lookup_dns=None)[source]

Bases: bag.email_validator.BaseValidator

A domain name validator that is ready for internationalized domains.

http://en.wikipedia.org/wiki/Internationalized_domain_name http://en.wikipedia.org/wiki/Top-level_domain

domain_pattern = '[\\w]+[\\w\\.\\-]*\\.[\\w]+'
domain_regex = re.compile('^[\\w]+[\\w\\.\\-]*\\.[\\w]+$', re.IGNORECASE)
false_positive_ips = ['208.67.217.132']
lookup_domain(domain, lookup_record=None)[source]

Looks up the DNS record for domain and returns:

  • None if it does not exist,

  • The IP address if looking up the “A” record, or

  • The list of hosts in the “MX” record.

The return value, when treated as a boolean, says whether a domain exists.

You can pass “a” or “mx” as the lookup_record parameter. Otherwise, the lookup_dns parameter from the constructor is used. “a” means verify that the domain exists. “mx” means verify that the domain exists and specifies mail servers.

validate(part)

OpenDNS has a feature that bites us. If you are using OpenDNS, and you type in your browser a domain that does not exist, OpenDNS catches that and presents a page. “Did you mean www.hovercraft.eels?” For us, this feature appears as a false positive when looking up the DNS server. So we try to work around it:

validate_domain(part)[source]
class bag.email_validator.EmailHarvester(*a, **k)[source]

Bases: bag.email_validator.EmailValidator

harvest(text)[source]

Yield the e-mail addresses contained in text.

class bag.email_validator.EmailValidator(local_part_chars=".-+_!#$%&'/=`|~?^{}*", **k)[source]

Bases: bag.email_validator.DomainValidator

validate(email)
validate_email(email)[source]
validate_local_part(part)[source]
exception bag.email_validator.ValidationException[source]

Bases: ValueError

Raised when a domain or email is invalid.