php - RegEx: Issue parsing forum post body (with quote) -
this type of thing hammer away on until can right, in case believe it’s part of regex i've never had head around completely. greedy vs. non-greedy stuff.
i have content:
[quote=mick-mick topic=33586] gave dayz hour of life. can never back. :/ had wait wait. slow loads server selection screen once chose server took 3 minutes server. i'll still give h1z1 shot sure. :) [/quote] test
the regex i’m attempting use is:
/(\[quote=[a-za-z0-9]+\](.*)\[\/quote\])?(.*)/m
but it’s matching quote line.
as can see need username (mick-mick), topic id, , content of quote, , content following quote. also, quote may not exist in content @ all.
can me on this? missing? using preg_match
in php.
final update:
to match multiple quotes and grab content isn't in quote got little difficult. but, here goes:
(?: \[quote=([a-z0-9\-]+) \s*topic=(\d+)\] (.*?) \[/quote\] | (.+?) (?=\[quote|$) )
this time use alternating non-capture group around everything. either match quote (with our capture groups 1, 2, , 3) or match 1+ other characters capture group 4 (this isn't part of quote). crucial addition here positive lookahead ((?=...)
). zero-length assertion (meaning "checks" doesn't match) looks either [quote
or end of string ($
) following it. used don't keep matching new quote.
note: global match in php, you'll need use preg_match_all()
.
update:
i updated grab content before/after quote , make quote optional (by adding optional non-capturing group: (?:...)?
). re-read question , saw quotes have quote/topic (if isn't case, you'll need combine these expressions bit..here is:
(.*?)(?:\[quote=([a-z0-9\-]+)\s*topic=(\d+)\](.*)\[/quote\])?(.*)
and used like:
preg_match('~(.*?)(?:\[quote=([a-z0-9\-]+)\s*topic=(\d+)\](.*)\[/quote\])?(.*)~si', $html, $matches); $matches[0]; // full match $matches[1]; // before quote (empty if quote doesn't exist) $matches[2]; // quote value: `mick-mick` $matches[3]; // topic value: `33586` $matches[4]; // quote contents: `i just...` $matches[5]; // else (entire string quote doesn't exist)
you have issues in expression, pretty close. here cleaned version:
\[quote=([a-z0-9\-]+)\s*(.*?)\](.*)\[/quote\]
you can use this:
preg_match('~\[quote=([a-z0-9\-]+)\s*(.*?)\](.*)\[/quote\]~si', $html, $matches); $matches[0]; // full match $matches[1]; // quote value: `mick-mick` $matches[2]; // quote parameters: `topic=33586` $matches[3]; // quote contents: `i just...`
the fundamental issue had wrapped in (...)?
, followed (.*)
. means first part optional, couldn't matched, , matched 0+ characters..since .
not match new line (unless use s
modifier in example), matched first line quote.
also, used quote=[a-za-z0-9]+
when quote ([quote=mick-mick topic=33586]
) had hyphen, space, , equal sign in it. instead, used [a-z0-9\-]
(with i
modifier case-insensitivity), followed whitespace (\s*
) followed lazy capture of rest of parameters.
let me know if have questions or want different functionality.
Comments
Post a Comment