regex - Removing lines from a text file using python and regular expressions -
i have text files, , want remove lines begin asterisk (“*”).
made-up example:
words *remove me words words *remove me my current code fails. follows below:
import re program = open(program_path, "r") program_contents = program.readlines() program.close() new_contents = [] pattern = r"[^*.]" line in program_contents: match = re.findall(pattern, line, re.dotall) if match.group(0): new_contents.append(re.sub(pattern, "", line, re.dotall)) else: new_contents.append(line) print new_contents this produces ['', '', '', '', '', '', '', '', '', '', '*', ''], no goo.
i’m python novice, i’m eager learn. , i’ll bundle function (right i’m trying figure out in ipython notebook).
thanks help!
your regular expression seems incorrect:
[^*.] means match character isn't ^, * or .. when inside bracket expression, after first ^ treated literal character. means in expression have . matching . character, not wildcard.
this why "*" lines starting *, you're replacing every character *! keep . present in original string. since other lines not contain * , ., of characters replaced.
if want match lines beginning *:
^\*.* what might easier this:
pat = re.compile("^[^*]") line in contents: if re.search(pat, line): new_contents.append(line) this code keeps line not start *.
in pattern ^[^*], first ^ matches start of string. expression [^*] matches character *. pattern matches starting character of string isn't *.
it trick think when using regular expressions. need assert string, need change or remove characters in string, need match substrings?
in terms of python, need think each function giving , need it. sometimes, in example, need know match found. might need match.
sometimes re.sub isn't fastest or best approach. why bother going through each line , replacing of characters, when can skip line in total? there's no sense in making empty string when you're filtering.
most importantly: need regex? (here don't!)
you don't need regular expression here. since know size , position of delimiter can check this:
if line[0] != "*": this faster regex. they're powerful tools , can neat puzzles figure out, delimiters fixed width , position, don't need them. regex more expensive approach making use of information.
Comments
Post a Comment