HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
54% Positive
Analyzed from 1408 words in the discussion.
Trending Topics
#cdata#rss#html#twenty#xml#content#amp#wanna#character#hurry

Discussion (26 Comments)Read Original on HackerNews
Do use CDATA nodes, but only work on XML with an actual XML DOM library instead of string manipulation. Browsers have these built-in (DOMParser).
> The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and MUST, for compatibility, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
> In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup and does not include the CDATA-section-close delimiter, " ]]> ". In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, " ]]> ".
> To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ".
https://www.w3.org/TR/xml/#syntax
The description contains HTML markup, such as <p></p> for paragraph breaks. CDATA is a nice and clean way to encode them without breaking anything.
The title doesn't contain any markup, and shouldn't. A good old escape function covers both the "doesn't" part and the "shouldn't" part.
I see people stuffing all sorts of HTML tags and nonstandard attributes in an RSS <description>, just because CDATA allows them to do so without breaking the parser. Images, videos, inline SVGs with maybe some scripts inside...
The RSS spec should never have allowed this. Reading a feed would have been much more pleasant (not to mention safer for everyone!) if the contents were required to be in plain text.
At least with a cdata tag your being explicitly told “here be dragons”
Whether it's efficient is a far second to whether it successfully imports the data.
Looking at you, WP All Import...
But I'd want to see evidence that this is actually the case. The OP seems to argue "don't use CData, because the escape sequence for ]]> looks confusing" - and that's just vibes, not a proper argument.
If it's for "looks" I think CData would actually be the much better choice. ]]> appears extremely rarely in RSS content while <>& are guaranteed to appear if your content is HTML. So in 99.99%, you won't need any escaping at all for CData and can just insert the HTML verbatim, while "regular" escaping will change every single angle bracket of your HTML.
I recently became aware of RSS stylesheets. Apparently there is a specification for that called XSLT which is distinct from CSS in both form and function. However, there are plans by Google/Mozilla to remove XSLT from their browser engines for security/maintainability reasons. Apparently RSS supports javascript though, so it's possible to manipulate the RSS DOM that way. One could imagine a javascript polyfill that interprets XSLT, although I'm not sure if there's some cross-site security issues that would make that impractical.
More like a little island in the XML archipelago.
> RSS stylesheets. Apparently there is a specification for that called XSLT
XSLT is a bit more than just “RSS stylesheets”.
No need to imagine a polyfil, they already exist: https://github.com/mfreed7/xslt_polyfill
I made a site with to get people started: https://www.rss.style/
Example RSS feed: https://www.rss.style/changelog.xml
Cross-site is fine by default, though the script is small enough to easily self-host. If you have a content-security-policy, you'll need to allow the host in script-src.
I've found CDATA invaluable, because I can just copy and paste the content from the HTML file to the XML file. I've never used the CDATA terminator characters in a blog post, so that's a non-problem.
Yes, that's why I said, "I come from a very different, old-school perspective."
However, I don't find the points persuasive:
1. A special case for the CDATA terminator doesn't seem any worse than special cases for every HTML character that needs to be escaped in XML.
2. I'm not sure who exactly the hypothetical misled people are (straw men?) who would think "the content is raw HTML or somehow safer."
3. I'm not sure how split CDATA blocks is "less uniform" than escaped characters or why less uniform output is a downside, especially as you state in another comment, "IMHO, RSS is for feed readers, not humans."
4. I'm not sure how CDATA makes "debugging confusing," and in any case using CDATA blocks inside an article seems like a pretty rare case; like I said, I haven't done that myself.
The content of that notorious discussion went on and off and on and off for weeks, giving all the netizens of the RSS community blogosphere terrible headaches, with people's entire blogs disappearing and reappearing every second, until it finally reached a flashing point, when Dave Winer humbly conceded that it wasn't the user's fault for being an idiot, and maybe just maybe there was tiny teeny little design flaw in RSS, and it wasn't actually such a great idea to allow HTML tags in RSS titles.
I Wanna Be <![CDATA[
Sung to the tune of “I Wanna Be Sedated”, with apologies to The Ramones.