blog | Gitar

Add more info to markdown.py about implementation

I've been playing with the idea of what would be the best way to write a Markdown 'transpiler' in Python. I've settled for a tree-based solution, where all blocks will be represented in a tree. See markdown.py for more information.

Author: Maarten 'Vngngdn' Vangeneugden
Date: July 18, 2017, 6:18 p.m.
Hash: 62e5f976d88bc53b2dd2fedcff0d02f137c83f7d
Parent: d3365599fb04a10e62959e03fc0ee9884a656800
Modified file: markdown.py

markdown.py ¶

20 additions and 3 deletions.

View changes Hide changes

fucking shit, I've decided to write my own implementation. Contary to the one in
PyPI, my version handles **all** cases, and is a **full implementation** of the
reference.

Oh, and just so you know: You don't need an entire shitty object oriented system
to make something decent. Sometimes the solution is a function. Period.
"""

"""
Checklist about all shit that must be implemented:
    - headers need to have their ID's be the same as the title. BUT! id's
      mustn't have spaces, and need to be unique. The latter isn't that big of a
      deal, but spaces in the header title must be converted to dashes.
    - HTML code needs to be escaped; & must become &amp;, < and > become &lt;
      and &gt; and so on. This isn't necessary for UTF-8 symbols such as ©,
      which can be put in place as is, instead of converting to &copy;.
    - Some elements have to be placed in the tag itself, such as links in <a />.
      This is noted with the {#} tags. The context in which they are used in the
      defaults should give a good explanation on what number points to what.
    - Remember to support 2 trailing spaces as <br />!
    - There are also "closing ATX headers": "# title" is the same as 
      "# title ####" and "# title #". (So it's purely cosmetic, remove the
      trailing whitespace in these cases)
    - When code is used, call Pygments to markup the code properly. If a code
      tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
      so it can do a better job. If nothing is provided, leave it as is. When
      it's an inline code block (`CODE`), leave that always as is.
      Look how to do it at
      <http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.
    """

def toHTML(
        file,
        emphasis = "<em>",
        emphasis_end  ="</em>",
        strong = "<strong>",
        strong_end = "</strong>",
        unordered_list = "<ul>",
        unordered_list_end = "</ul>",
        ordered_list = "<ol>",
        ordered_list_end = "</ol>",
        list_item = "<li>",
        list_item_end = "</li>",
        hyperlink = "<a href=\"{0}\" title=\"{1}\">",
        hyperlink_end = "</a>",
        image = "<img src=\"{0}\">",
        image_end = "</img>",
        paragraph = "<p>",
        paragraph_end = "</p>",
        blockquote = "<quote>",
        blockquote_end = "</quote>",
        header1 = "<h1 id=\"{0}\">",
        blockquote_end = "</blockquote>",
        header1 = "<h1 id=\"{0}\">",
        header1_end = "</h1>",
        header2 = "<h2 id=\"{0}\">",
        header2_end = "</h2>",
        header3 = "<h3 id=\"{0}\">",
        header3_end = "</h3>",
        header4 = "<h4 id=\"{0}\">",
        header4_end = "</h4>",
        header5 = "<h5 id=\"{0}\">",
        header5_end = "</h5>",
        header6 = "<h6 id=\"{0}\">",
        header6_end = "</h6>",
        code = "<code>",
        code_end = "</code>",
        code_end = "</code>",
        ):
    # Zoom zoom insert magic code here

    file_content = file.read()


    # for more information. That said, it's imperative to **first** collect all
    # information about hyperlinks, and remove it, so it can be used when
    # parsing hyperlinks.

    """ The reason the length is stored instead of the end, is because it is
    less error prone; if a parent node is updated, only the begin needs to be
    updated, as the length is still the same for the node. The begin can be
    relative to the parent node, so even that won't have to be updated. """
    node = {
            "type": block_type,
            "begin": begin,
            "length": length,
            "children": children,
            }

    return markdown_code


1	1	fucking shit, I've decided to write my own implementation. Contary to the one in
2	2	PyPI, my version handles all cases, and is a full implementation of the
3	3	reference.
4	4
5	5	Oh, and just so you know: You don't need an entire shitty object oriented system
6	6	to make something decent. Sometimes the solution is a function. Period.
7	7	"""
8	8
9	9	"""
10	10	Checklist about all shit that must be implemented:
11	11	- headers need to have their ID's be the same as the title. BUT! id's
12	12	mustn't have spaces, and need to be unique. The latter isn't that big of a
13	13	deal, but spaces in the header title must be converted to dashes.
14	14	- HTML code needs to be escaped; & must become &, < and > become <
15	15	and > and so on. This isn't necessary for UTF-8 symbols such as ©,
16	16	which can be put in place as is, instead of converting to ©.
17	17	- Some elements have to be placed in the tag itself, such as links in <a />.
18	18	This is noted with the {#} tags. The context in which they are used in the
19	19	defaults should give a good explanation on what number points to what.
20	20	- Remember to support 2 trailing spaces as <br />!
21	21	- There are also "closing ATX headers": "# title" is the same as
22	22	"# title ####" and "# title #". (So it's purely cosmetic, remove the
23	23	trailing whitespace in these cases)
24	24	- When code is used, call Pygments to markup the code properly. If a code
25	25	tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
26	26	so it can do a better job. If nothing is provided, leave it as is. When
27	27	it's an inline code block (`CODE`), leave that always as is.
28	28	Look how to do it at
29	29	<http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.
30	30	"""
31	31
32	32	def toHTML(
33	33	file,
34	34	emphasis = "<em>",
35	35	emphasis_end ="</em>",
36	36	strong = "<strong>",
37	37	strong_end = "</strong>",
38	38	unordered_list = "<ul>",
39	39	unordered_list_end = "</ul>",
40	40	ordered_list = "<ol>",
41	41	ordered_list_end = "</ol>",
42	42	list_item = "<li>",
43	43	list_item_end = "</li>",
44	44	hyperlink = "<a href=\"{0}\" title=\"{1}\">",
45	45	hyperlink_end = "</a>",
46	46	image = "<img src=\"{0}\">",
47	47	image_end = "</img>",
48	48	paragraph = "<p>",
49	49	paragraph_end = "</p>",
50	50	blockquote = "<quote>",
51	-	blockquote_end = "</quote>",
52	-	header1 = "<h1 id=\"{0}\">",
+	51	blockquote_end = "</blockquote>",
+	52	header1 = "<h1 id=\"{0}\">",
53	53	header1_end = "</h1>",
54	54	header2 = "<h2 id=\"{0}\">",
55	55	header2_end = "</h2>",
56	56	header3 = "<h3 id=\"{0}\">",
57	57	header3_end = "</h3>",
58	58	header4 = "<h4 id=\"{0}\">",
59	59	header4_end = "</h4>",
60	60	header5 = "<h5 id=\"{0}\">",
61	61	header5_end = "</h5>",
62	62	header6 = "<h6 id=\"{0}\">",
63	63	header6_end = "</h6>",
64	64	code = "<code>",
65	-	code_end = "</code>",
+	65	code_end = "</code>",
66	66	):
67	67	# Zoom zoom insert magic code here
68	68
+	69	file_content = file.read()
+	70
69	71
+	72	# for more information. That said, it's imperative to first collect all
+	73	# information about hyperlinks, and remove it, so it can be used when
+	74	# parsing hyperlinks.
+	75
+	76	""" The reason the length is stored instead of the end, is because it is
+	77	less error prone; if a parent node is updated, only the begin needs to be
+	78	updated, as the length is still the same for the node. The begin can be
+	79	relative to the parent node, so even that won't have to be updated. """
+	80	node = {
+	81	"type": block_type,
+	82	"begin": begin,
+	83	"length": length,
+	84	"children": children,
+	85	}
+	86
70	87	return markdown_code
71	88