regex - Regular expression that both includes and excludes certain strings in R -
i trying use r parse through number of entries. have 2 requirements the entries want back. want entries contain word apple don't contain word orange.
for example:
- i apples
- i apples
- i apples , oranges
i want entries 1 , 2 back.
how go using r this?
thanks.
using regular expression, following.
x <- c('i apples', 'i apples', 'i apples , oranges', 'i oranges , apples', 'i oranges , apples oranges more') x[grepl('^((?!.*orange).)*apple.*$', x, perl=true)] # [1] "i apples" "i apples"
the regular expression looks ahead see if there's no character except line break , no substring orange
, if so, dot .
match character except line break wrapped in group, , repeated (0
or more times). next apple
, character except line break (0
or more times). finally, start , end of line anchors in place make sure input consumed.
update: use following if performance issue.
x[grepl('^(?!.*orange).*$', x, perl=true)]
Comments
Post a Comment