So, I’m basically trying to parse a string literal with nom. This is the code I’ve come up with:
use nom::{
bytes::complete::{tag, take_until},
sequence::delimited,
IResult,
};
/// Parses string literals.
fn parse_literal<'a>(input: &'a str) -> IResult<&'a str, &'a str> {
// escape tag identifier is the same as delimiter, obviously
let escape_tag_identifier =
input
.chars()
.nth(0)
.ok_or(nom::Err::Error(nom::error::Error::new(
input,
nom::error::ErrorKind::Verify,
)))?;
let (remaining, value) = delimited(
tag(escape_tag_identifier.to_string().as_str()),
take_until(match escape_tag_identifier {
'\'' => "'",
'"' => "\"",
_ => unreachable!("parse_literal>>take_until branched into unreachable."),
}),
tag(escape_tag_identifier.to_string().as_str()),
)(input)?;
Ok((remaining, value))
}
#[cfg(test)]
mod literal_tests {
use super::*;
#[rstest]
#[case(r#""foo""#, "foo")]
#[case(r#""foo bar""#, "foo bar")]
#[case(r#""foo \" bar""#, r#"foo " bar"#)]
fn test_dquotes(#[case] input: &str, #[case] expected_output: &str) {
let result = parse_literal(input);
assert_eq!(result, Ok(("", expected_output)));
}
#[rstest]
#[case("'foo'", "foo")]
#[case("'foo bar'", "foo bar")]
#[case(r#"'foo \' bar'"#, "foo ' bar")]
fn test_squotes(#[case] input: &str, #[case] expected_output: &str) {
let result = parse_literal(input);
assert_eq!(result, Ok(("", expected_output)));
}
#[rstest]
#[case(r#""foo'"#, "foo'")]
#[case(r#"'foo""#, r#"foo""#)]
fn test_errs(#[case] input: &str, #[case] expected_err_input: &str) {
let result = parse_literal(input);
assert_eq!(
result,
Err(nom::Err::Error(nom::error::Error::new(
expected_err_input,
nom::error::ErrorKind::TakeUntil
))),
);
}
}
Note: The example uses
rstest
for tests.
Although it looks a little bit complex, actually, it is not. Basically, the parse function is parse_literal
. The tests are separated for double quotes and single quotes and errors.,
When you run the tests, you will realize first and second cases for single and double quotes run successfully. The problem is with the third case of each: #[case(r#""foo \" bar""#, r#"foo " bar"#)]
for test_dquotes
and #[case(r#"'foo \' bar'"#, "foo ' bar")]
for test_squotes
.
Ideally, if a string literal is defined with single quotes and has single quotes in its content, the single quotes can be escaped with single quotes again. Same goes for double quotes as well. To demonstrate in a pseudocode:
"foo ' bar" // is ok
"foo \" bar" // is ok
"foo " bar" // is err
'foo " bar' // is ok
'foo \' bar' // is ok
'foo ' bar' // is err
Currently, in the code, I take characters until the delimiter with take_until
, which reaches to the end of the input
, which, let’s say, in this case, is guaranteed to contain only and only the string literal as input. So it’s kind of okay for first and second cases in the tests.
But, of course, this fails in the third cases of each test since the input
has the delimiter character early on, finishes early and returns the remaining.
This is only for research purposes, so you do not need to give a fully-featured answer. A pathway is, as well, appreciated.
Thanks in advance.
Have you tried the
escaped
function? https://docs.rs/nom/latest/nom/bytes/complete/fn.escaped.htmlHmm, didn’t see that. Lemme play with that a little. Maybe I can come up with something. Thank you.
What I do to parse strings (pseudo code since I’m on mobile, don’t copy-paste):
delimited( ", many0(alt( any_character_except_quote_or_slash, pair('\', escaped_char) )), " )
Where
any_except_quote_or_slash
andescaped_char
are defined somewhere else, the rest of the parsers are by nom.You may want to wrap
pair
with amap
andmany0
withrecognize
.