<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">inline</div><div class=""><div class=""><br class=""><!-- signature open --><div class="">Regards</div>(From<span class="Apple-style-span" style="-webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); "> mobile)</span><!-- signature close --></div><div class=""><span class="Apple-style-span" style="-webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); "><br class=""></span></div>On May 2, 2016, at 2:23 PM, John Holdsworth <<a href="mailto:mac@johnholdsworth.com" class="">mac@johnholdsworth.com</a>> wrote:<br class=""><br class=""></div><blockquote type="cite" class=""><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">I'm having trouble getting the `e` modifier to work as advertised, at least for the sequence `\\`. For example, `print(e"\\\\")` prints two backslashes, and `print(e"\\\")` seems to try to escape the string literal. I'm currently envisioning `e` as disabling *all* backslash escapes, so these behaviors wouldn't be appropriate. It also looks like interpolation is still enabled in `e` strings.</div><div class=""><div class=""><br class="">Since other things like `print(e"\w+")` work just fine, I'm guessing this is a bug in the proposal's sketches (not being clear enough about the expected behavior), not your code.<br class=""><br class="">I've written a gist with some tests to show how I expect things to work:<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><a href="https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e" class="">https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e</a></div></div></blockquote><div class=""><br class=""></div>The problem here is that I’ve not implemented unescaped literals fully as it would require changes outside the lexer.</div><div class="">This is because the string is first lexed and tokenised by one piece of code <span style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(79, 129, 135);" class="">Lexer</span><span style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;" class="">::lexStringLiteral </span>but later</div><div class="">on in the code generation phase it generates the actual literal in a function <span style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(79, 129, 135);" class="">Lexer</span><span style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;" class="">::getEncodedStringSegment.</span></div><div class="">This is passed the same string from the source file but does not know what modifiers should be applied. As a result</div><div class="">normal escapes are still processed. All the “e” flag does is silence the error for invalid escapes during tokenising.</div></div></blockquote><div class=""><br class=""></div><div class="">Lexer just lays ropes around certain areas to tell what's where. sometimes this is not enough for extra semantics. this is the reason why i went down the path of a custom string_multiline_literal token. It looks like you might want to consider that path too. <span style="background-color: rgba(255, 255, 255, 0);" class="">If you do, you might </span><span style="background-color: rgba(255, 255, 255, 0);" class="">consider the merits of suggesting that half the work be put in place now, allowing both our experimentations (and other more sophisticated) to lean on it, as an alternative to just </span>directly adding extra conditional code in the default lexer code.</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="">Having encountered this limitation I managed to persuade myself this is what you want anyway but perhaps few would agree,</div><div class="">What has been implemented is more of an r”” than a e”” that solves the “picket fence” problem where you can also interpolate</div><div class="">into convenient regex literals. This is all beyond the scope of this proposal anyway so I’ll leave that battle for another day.</div></div></blockquote><blockquote type="cite" class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="">The changes to the compiler for anything else would be a step up in terms of disruption.</div></div></blockquote><br class=""></div><div class="">I found that by separating <i class="">new</i> from <i class="">existing</i> in Lexer using a new token, you can go further along without really disrupting the original flow. Having a custom token would give your a differentiation point to know how to treat the contents differently. As a concrete eg, this is my way to deal with 2 character prefix/postfix around multiline literals while keeping the existing interpolation logic in place:</div><div class=""><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2" class=""><br class=""></span></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2" class="">void</span><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #4f8187" class="">Lexer</span><span style="font-variant-ligatures: no-common-ligatures" class="">::getStringLiteralSegments(UNCHANGED SIG) {</span></div></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> // normal initialization</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; line-height: normal; font-family: Menlo; color: rgb(0, 132, 0);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures" class="">// drop double character marker of multiline literals</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo; color: rgb(49, 89, 93);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2" class="">if</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> (Str.</span><span style="font-variant-ligatures: no-common-ligatures" class="">is</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #4f8187" class="">tok</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">::</span><span style="font-variant-ligatures: no-common-ligatures" class="">string_multiline_literal</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">)) {</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> Bytes = Bytes.</span><span style="font-variant-ligatures: no-common-ligatures; color: #3d1d81" class="">drop_front</span><span style="font-variant-ligatures: no-common-ligatures" class="">().</span><span style="font-variant-ligatures: no-common-ligatures; color: #3d1d81" class="">drop_back</span><span style="font-variant-ligatures: no-common-ligatures" class="">();</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> }</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo; min-height: 14px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span> </div><div style="margin: 0px; line-height: normal; font-family: Menlo; min-height: 14px;" class=""> // normal segmenter below</div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div class="">Just thinking… another way to differentiation could be to seed the second lexer with a specific initial token to giving it a different context to interpret incoming chars from. Would probably give you the extra context you seem to be looking for (without widening the signature of the existing parse/lexer communication channel).</div><div class=""><br class=""></div><div class="">@dabrahams / @clattner</div><div class="">Might I ask if it would be possible to have even a very high level <i class="">yup</i>/<i class="">nope</i> answer regarding the feasibility of using the temporary lexer swapping facility to inline SIL contents as the body of multiline string literal expression? </div><div class=""><br class=""></div></body></html>