bag.email_validator module¶
The ultimate functions for domain validation and e-mail address validation.
Why not just use a regular expression?¶
http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx
There are many regular expressions out there for this. The “perfect one” is several KB long and therefore unmaintainable (Perl people wrote it…).
This is 2009 and domain rules are changing too. Impossible domain names have become possible, international domain names are real…
So validating an e-mail address is more complex than you might think. Take a look at some of the rules: http://en.wikipedia.org/wiki/E-mail_address#RFC_specification
How to do it then?¶
I believe the solution should combine simple regular expressions with imperative programming.
E-mail validation is also dependent on the robustness principle: “Be conservative in what you do, be liberal in what you accept from others.” http://en.wikipedia.org/wiki/Postel%27s_law
This module recognizes that e-mail validation can be done in several different ways, according to purpose:
1) Most of the time you just want validation according to the standard rules.
So just say: v = EmailValidator()
2) If you are creating e-mail addresses for your server or your organization,
you might need to satisfy a stricter policy such as “dash is not allowed in
email addresses”. The EmailValidator constructor accepts a local_part_chars
argument to help build the right regular expression for you.
Example: v = EmailValidator(local_part_chars='.-+_')
3) What about typos? An erroneous dot at the end of a typed email is typical. Other common errors with the dots revolve around the @: user@.domain.com. These typing mistakes can be automatically corrected, saving you from doing it manually. For this you use the fix flag when instantiating a validator:
d = DomainValidator(fix=True)
domain, error_message = d.validate('.supercalifragilistic.com.br')
if error_message:
print('Invalid domain:', domain)
else:
print('Valid domain:', domain)
4) TODO: Squash the bugs in this feature!
Paranoid people may wish to verify that the informed domain actually exists.
For that you can pass a lookup_dns='a'
argument to the constructor, or even
lookup_dns=’mx’ to verify that the domain actually has e-mail servers.
To use this feature, you need to install the pydns library:
pip install pydns
How to use¶
The validating methods return a tuple (email, error_msg). email is the trimmed and perhaps fixed email. error_msg is an empty string when the e-mail is valid.
Typical usage is:
v = EmailValidator() # or EmailValidator(fix=True)
email = raw_input('Type an email: ')
email, err = v.validate(email)
if err:
print('Error:', err)
else:
print('E-mail is valid:', email) # the email, corrected
There is also an EmailHarvester class to collect e-mail addresses from any text.
See also tests/test_email_validator.py
- class bag.email_validator.DomainValidator(fix=False, lookup_dns=None)[source]¶
Bases:
bag.email_validator.BaseValidator
A domain name validator that is ready for internationalized domains.
http://en.wikipedia.org/wiki/Internationalized_domain_name http://en.wikipedia.org/wiki/Top-level_domain
- domain_pattern = '[\\w]+[\\w\\.\\-]*\\.[\\w]+'¶
- domain_regex = re.compile('^[\\w]+[\\w\\.\\-]*\\.[\\w]+$', re.IGNORECASE)¶
- false_positive_ips = ['208.67.217.132']¶
- lookup_domain(domain, lookup_record=None)[source]¶
Looks up the DNS record for domain and returns:
None if it does not exist,
The IP address if looking up the “A” record, or
The list of hosts in the “MX” record.
The return value, when treated as a boolean, says whether a domain exists.
You can pass “a” or “mx” as the lookup_record parameter. Otherwise, the lookup_dns parameter from the constructor is used. “a” means verify that the domain exists. “mx” means verify that the domain exists and specifies mail servers.
- validate(part)¶
OpenDNS has a feature that bites us. If you are using OpenDNS, and you type in your browser a domain that does not exist, OpenDNS catches that and presents a page. “Did you mean www.hovercraft.eels?” For us, this feature appears as a false positive when looking up the DNS server. So we try to work around it:
- class bag.email_validator.EmailValidator(local_part_chars=".-+_!#$%&'/=`|~?^{}*", **k)[source]¶
Bases:
bag.email_validator.DomainValidator
- validate(email)¶