regex - Removing lines from a text file using python and regular expressions -


i have text files, , want remove lines begin asterisk (“*”).

made-up example:

words *remove me words words *remove me  

my current code fails. follows below:

import re  program = open(program_path, "r") program_contents = program.readlines() program.close()   new_contents = [] pattern = r"[^*.]" line in program_contents:     match = re.findall(pattern, line, re.dotall)     if match.group(0):         new_contents.append(re.sub(pattern, "", line, re.dotall))     else:         new_contents.append(line)  print new_contents 

this produces ['', '', '', '', '', '', '', '', '', '', '*', ''], no goo.

i’m python novice, i’m eager learn. , i’ll bundle function (right i’m trying figure out in ipython notebook).

thanks help!

your regular expression seems incorrect:

[^*.] 

means match character isn't ^, * or .. when inside bracket expression, after first ^ treated literal character. means in expression have . matching . character, not wildcard.

this why "*" lines starting *, you're replacing every character *! keep . present in original string. since other lines not contain * , ., of characters replaced.

if want match lines beginning *:

^\*.* 

what might easier this:

pat = re.compile("^[^*]")  line in contents:     if re.search(pat, line):         new_contents.append(line) 

this code keeps line not start *.

in pattern ^[^*], first ^ matches start of string. expression [^*] matches character *. pattern matches starting character of string isn't *.

it trick think when using regular expressions. need assert string, need change or remove characters in string, need match substrings?

in terms of python, need think each function giving , need it. sometimes, in example, need know match found. might need match.

sometimes re.sub isn't fastest or best approach. why bother going through each line , replacing of characters, when can skip line in total? there's no sense in making empty string when you're filtering.

most importantly: need regex? (here don't!)

you don't need regular expression here. since know size , position of delimiter can check this:

if line[0] != "*":  

this faster regex. they're powerful tools , can neat puzzles figure out, delimiters fixed width , position, don't need them. regex more expensive approach making use of information.


Comments

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -