blog

Add more info to markdown.py about implementation

I've been playing with the idea of what would be the best way to write a Markdown 'transpiler' in Python. I've settled for a tree-based solution, where all blocks will be represented in a tree. See markdown.py for more information.

Author
Maarten 'Vngngdn' Vangeneugden
Date
July 18, 2017, 8:18 p.m.
Hash
62e5f976d88bc53b2dd2fedcff0d02f137c83f7d
Parent
d3365599fb04a10e62959e03fc0ee9884a656800
Modified file
markdown.py

markdown.py

20 additions and 3 deletions.

View changes Hide changes
1
1
fucking shit, I've decided to write my own implementation. Contary to the one in
2
2
PyPI, my version handles **all** cases, and is a **full implementation** of the
3
3
reference.
4
4
5
5
Oh, and just so you know: You don't need an entire shitty object oriented system
6
6
to make something decent. Sometimes the solution is a function. Period.
7
7
"""
8
8
9
9
"""
10
10
Checklist about all shit that must be implemented:
11
11
    - headers need to have their ID's be the same as the title. BUT! id's
12
12
      mustn't have spaces, and need to be unique. The latter isn't that big of a
13
13
      deal, but spaces in the header title must be converted to dashes.
14
14
    - HTML code needs to be escaped; & must become &amp;, < and > become &lt;
15
15
      and &gt; and so on. This isn't necessary for UTF-8 symbols such as ©,
16
16
      which can be put in place as is, instead of converting to &copy;.
17
17
    - Some elements have to be placed in the tag itself, such as links in <a />.
18
18
      This is noted with the {#} tags. The context in which they are used in the
19
19
      defaults should give a good explanation on what number points to what.
20
20
    - Remember to support 2 trailing spaces as <br />!
21
21
    - There are also "closing ATX headers": "# title" is the same as 
22
22
      "# title ####" and "# title #". (So it's purely cosmetic, remove the
23
23
      trailing whitespace in these cases)
24
24
    - When code is used, call Pygments to markup the code properly. If a code
25
25
      tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
26
26
      so it can do a better job. If nothing is provided, leave it as is. When
27
27
      it's an inline code block (`CODE`), leave that always as is.
28
28
      Look how to do it at
29
29
      <http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.
30
30
    """
31
31
32
32
def toHTML(
33
33
        file,
34
34
        emphasis = "<em>",
35
35
        emphasis_end  ="</em>",
36
36
        strong = "<strong>",
37
37
        strong_end = "</strong>",
38
38
        unordered_list = "<ul>",
39
39
        unordered_list_end = "</ul>",
40
40
        ordered_list = "<ol>",
41
41
        ordered_list_end = "</ol>",
42
42
        list_item = "<li>",
43
43
        list_item_end = "</li>",
44
44
        hyperlink = "<a href=\"{0}\" title=\"{1}\">",
45
45
        hyperlink_end = "</a>",
46
46
        image = "<img src=\"{0}\">",
47
47
        image_end = "</img>",
48
48
        paragraph = "<p>",
49
49
        paragraph_end = "</p>",
50
50
        blockquote = "<quote>",
51
-
        blockquote_end = "</quote>",
52
-
        header1 = "<h1 id=\"{0}\">",
+
51
        blockquote_end = "</blockquote>",
+
52
        header1 = "<h1 id=\"{0}\">",
53
53
        header1_end = "</h1>",
54
54
        header2 = "<h2 id=\"{0}\">",
55
55
        header2_end = "</h2>",
56
56
        header3 = "<h3 id=\"{0}\">",
57
57
        header3_end = "</h3>",
58
58
        header4 = "<h4 id=\"{0}\">",
59
59
        header4_end = "</h4>",
60
60
        header5 = "<h5 id=\"{0}\">",
61
61
        header5_end = "</h5>",
62
62
        header6 = "<h6 id=\"{0}\">",
63
63
        header6_end = "</h6>",
64
64
        code = "<code>",
65
-
        code_end = "</code>",
+
65
        code_end = "</code>",
66
66
        ):
67
67
    # Zoom zoom insert magic code here
68
68
+
69
    file_content = file.read()
+
70
69
71
+
72
    # for more information. That said, it's imperative to **first** collect all
+
73
    # information about hyperlinks, and remove it, so it can be used when
+
74
    # parsing hyperlinks.
+
75
+
76
    """ The reason the length is stored instead of the end, is because it is
+
77
    less error prone; if a parent node is updated, only the begin needs to be
+
78
    updated, as the length is still the same for the node. The begin can be
+
79
    relative to the parent node, so even that won't have to be updated. """
+
80
    node = {
+
81
            "type": block_type,
+
82
            "begin": begin,
+
83
            "length": length,
+
84
            "children": children,
+
85
            }
+
86
70
87
    return markdown_code
71
88