blog

Have a minimal working Markdown transpiler

Author
Maarten 'Vngngdn' Vangeneugden
Date
Aug. 7, 2017, 10:48 p.m.
Hash
772aedaf0099886a369576cc8f8ab9a6860ced2e
Parent
62e5f976d88bc53b2dd2fedcff0d02f137c83f7d
Modified files
DEPENDENCIES.md
markdown.py
views.py

DEPENDENCIES.md

2 additions and 1 deletion.

View changes Hide changes
1
1
============
2
2
3
3
Since web apps (and everything web related) is known to suck at dependency handling (which is why we have Bower, NPM, cdnjs, docker, ... (I'll explain sometime somewhere why this is done, and why it's useless), I've decided to take matters in my own hands; I'll just provide one simple markdown file listing all dependencies in human-readable format. I don't care whether this helps or not, but no harm is done, right?
4
4
5
5
Legend
6
6
------
7
7
8
8
The dependencies themselves have been ranked with a certain grade. I explain what they mean here:
9
9
10
10
 * A : This dependency is present inside the app. This will be in the form of ready-to-use source code files. As such, they need no further attention.
11
11
 * B : This dependency is present inside the app. This will be in the form of an archive file (i.e. .tar.gz), OR this needs additional work to function properly. In these cases, I refer to their documentation which will be available at the location of the files.
12
12
 * C : This dependency does not come with the app. These are often system packages, which often means (if you're on GNU/Linux) to search for it in your repositories. Their documentation is your last resort for these if it doesn't work as it should.
13
13
 
14
14
I hope I never will, but when it's necessary, I'll add a D grade, for unreadable, egregious bogus software that will give you all horrible kinds of diseases if you dare to work with it, but are indispensable to make the web app function. Just be happy I don't use it yet. (For example, PHP qualifies as D, so praise [our lord and saviour](https://rms.sexy) I write Python.)
15
15
16
16
A
17
17
-
18
18
 * [Materialize](http://materializecss.com/): The layout framework for the web app. (Material Design is the best)
19
19
20
20
21
21
B
22
22
-
23
23
24
24
C
25
25
-
26
26
 * [Markdown](https://pypi.python.org/pypi/Markdown): Required to compile the
27
-
   blog post files to HTML.
+
27
   blog post files to HTML.
28
28
+
29

markdown.py

164 additions and 36 deletions.

View changes Hide changes
+
1
import pygments
+
2
+
3
""" So welcome to my Markdown module. Since the markdown library in PyPI is
1
4
fucking shit, I've decided to write my own implementation. Contary to the one in
2
5
PyPI, my version handles **all** cases, and is a **full implementation** of the
3
6
reference.
4
7
5
8
Oh, and just so you know: You don't need an entire shitty object oriented system
6
9
to make something decent. Sometimes the solution is a function. Period.
7
10
"""
8
11
9
12
"""
10
13
Checklist about all shit that must be implemented:
11
14
    - headers need to have their ID's be the same as the title. BUT! id's
12
15
      mustn't have spaces, and need to be unique. The latter isn't that big of a
13
16
      deal, but spaces in the header title must be converted to dashes.
14
17
    - HTML code needs to be escaped; & must become &amp;, < and > become &lt;
15
18
      and &gt; and so on. This isn't necessary for UTF-8 symbols such as ©,
16
19
      which can be put in place as is, instead of converting to &copy;.
17
20
    - Some elements have to be placed in the tag itself, such as links in <a />.
18
21
      This is noted with the {#} tags. The context in which they are used in the
19
22
      defaults should give a good explanation on what number points to what.
20
23
    - Remember to support 2 trailing spaces as <br />!
21
24
    - There are also "closing ATX headers": "# title" is the same as 
22
25
      "# title ####" and "# title #". (So it's purely cosmetic, remove the
23
26
      trailing whitespace in these cases)
24
27
    - When code is used, call Pygments to markup the code properly. If a code
25
28
      tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
26
29
      so it can do a better job. If nothing is provided, leave it as is. When
27
30
      it's an inline code block (`CODE`), leave that always as is.
28
31
      Look how to do it at
29
32
      <http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.
30
33
    """
+
34
Future expansions:
+
35
    - Allow nesting of more elements. For example: Headers cannot be nested in
+
36
      blockquotes, but this is a nice thing to have.
+
37
    - Allow headers to follow a line wrapping, if the next line is perceded by
+
38
      the same amount of hashtags (=> same header level).
+
39
    - Allow the special p "Perseverance porn" stories, about how someone walks 10 miles to work every day, have the effect of normalizing the big disadvantages from society that make people do hard labor that society should not need.
+
40
+
41
Marriage
+
42
7 August 2017aragraph blockquote style:
+
43
      https://daringfireball.net/projects/markdown/syntax#blockquote
+
44
    """
31
45
32
46
def toHTML(
+
47
def toHTML(
33
48
        file,
34
-
        emphasis = "<em>",
35
-
        emphasis_end  ="</em>",
36
-
        strong = "<strong>",
37
-
        strong_end = "</strong>",
38
-
        unordered_list = "<ul>",
39
-
        unordered_list_end = "</ul>",
40
-
        ordered_list = "<ol>",
41
-
        ordered_list_end = "</ol>",
42
-
        list_item = "<li>",
43
-
        list_item_end = "</li>",
44
-
        hyperlink = "<a href=\"{0}\" title=\"{1}\">",
45
-
        hyperlink_end = "</a>",
46
-
        image = "<img src=\"{0}\">",
47
-
        image_end = "</img>",
48
-
        paragraph = "<p>",
49
-
        paragraph_end = "</p>",
50
-
        blockquote = "<blockquote>",
51
-
        blockquote_end = "</blockquote>",
52
-
        header1 = "<h1 id=\"{0}\">",
53
-
        header1_end = "</h1>",
54
-
        header2 = "<h2 id=\"{0}\">",
55
-
        header2_end = "</h2>",
56
-
        header3 = "<h3 id=\"{0}\">",
57
-
        header3_end = "</h3>",
58
-
        header4 = "<h4 id=\"{0}\">",
59
-
        header4_end = "</h4>",
60
-
        header5 = "<h5 id=\"{0}\">",
61
-
        header5_end = "</h5>",
62
-
        header6 = "<h6 id=\"{0}\">",
63
-
        header6_end = "</h6>",
64
-
        code = "<code lang=\"{0}\">",
65
-
        code_end = "</code>",
66
-
        ):
+
49
        emphasis = r"<em>{text}</em>",
+
50
        strong = r"<strong>{text}</strong>",
+
51
        unordered_list = r"<ul>{items}</ul>",
+
52
        ordered_list = r"<ol>{items}</ol>",
+
53
        list_item = r"<li>{text}</li>",
+
54
        hyperlink = r'<a href="{link}" title="{title}">{text}</a>',
+
55
        image = r'<img src="{link}" alt="{alt}" title="{title}" />',
+
56
        paragraph = r"<p>{text}</p>",
+
57
        blockquote = r"<blockquote>{text}</blockquote>",
+
58
        header1 = r'<h1 id="{link}">{text}</h1>',
+
59
        header2 = r'<h2 id="{link}">{text}</h2>',
+
60
        header3 = r'<h3 id="{link}">{text}</h3>',
+
61
        header4 = r'<h4 id="{link}">{text}</h4>',
+
62
        header5 = r'<h5 id="{link}">{text}</h5>',
+
63
        header6 = r'<h6 id="{link}">{text}</h6>',
+
64
        code = r'<code lang="{language}">{code}</code>',
+
65
        incorrect = r"<s>{text}</s>",
+
66
        line_break = r"<br />",
+
67
        horizontal_rule = r"<hr />",
+
68
        ):
67
69
    # Zoom zoom insert magic code here
+
70
+
71
    This is a pure function.
+
72
+
73
    This function will translate given Markdown code to HTML code.
+
74
    It follows the specification as good as possible, with a few custom additions:
+
75
    - Incorrect text can be marked with "~" around a text block.
+
76
+
77
    The default parameters have sane defaults, but can be customized if you wish
+
78
    to do so. Pay attention to the tags, as your custom value must also
+
79
    incorporate these.
+
80
+
81
    The function works in a simple way:
+
82
    1. Replace all redundant content with only 1 unique part
+
83
    1.1. For example: 5 blank lines mean the same as 2; a line with only spaces
+
84
         and tabs means the same as an empty line; hashtags at the end of a header
+
85
         line are meaningless; ...
+
86
    2. Handle blockquotes. Blockquotes have the highest precedence and can contain
+
87
       any other element, thus it's easiest to just handle these as soon as possible.
+
88
    3. Replace Setext with atx-style headers, to provide consistency for header handling.
+
89
    4. Handle block elements (paragraphs, code, ...).
+
90
    5. In all block elements, handle span elements (links, emphasis, ...).
+
91
    """
+
92
+
93
    # Replacing some shit:
+
94
    text = re.sub(r"^[ \t]+$", "\n", text)  # Make all blank lines consistent
+
95
    text = re.sub(r"\n{3}", "\n\n", text)  # Replace redundant blanks with 2 blank lines
+
96
+
97
    # XXX: Blockquotes have the highest precedence: **ANYTHING** can be nested
+
98
    # in a blockquote. So, handle these first, and convert them up front to
+
99
    # make it easier to handle the other text.
+
100
+
101
+
102
    """ About handling blockquotes:
+
103
    Every line that starts with "> " is a blockquote. As long as the next line
+
104
    starts in the same way, it's considered part of the same blockquote.
+
105
    **However**, there is 1 exception to this rule:
+
106
    paragraphs that are hard-wrapped only need 1 > for their first line, but can
+
107
    then be hard wrapped, and even start without prior spacing.
+
108
    """
+
109
    blockquotes_left = True
+
110
    while blockquotes_left:
+
111
         blockquote = re.compile(r"(^> .+\n)+")
+
112
         quote = blockquote.search(text)
+
113
         if quote is None:
+
114
             blockquotes_left = False
+
115
         else:
+
116
             begin, end = quote.span()
+
117
             reworked = "<blockquote>" + text[begin:end].replace(r"\n> ", r"\n") + r"</blockquote>\n"
+
118
             text = text[:begin] + reworked + text[end:]
+
119
+
120
    # All blockquotes are now removed
+
121
+
122
    # Converting setext to atx headers
+
123
    text = re.sub(r"^(?P<title>.+)\n=+$", r"# \g<title>", text, flags=re.MULTILINE)
+
124
    text = re.sub(r"^(?P<title>.+)\n-+$", r"## \g<title>", text, flags=re.MULTILINE)
+
125
    # All are now converted to atx style headers
+
126
    # Transforming headers:
+
127
    for i in range(1,7):
+
128
        header = r"^#{"+str(i)+r"} (?P<title>.+)$"
+
129
        match = re.search(header, text, flags=re.MULTILINE)
+
130
        while match is not None:
+
131
            future_id = match['title'].lower()
+
132
            future_id = re.sub(r"[ _,.!]", r"-", future_id)
+
133
            dictionary = match.groupdict()
+
134
            dictionary['link'] = future_id
+
135
            replacement = (r'<h'+str(i)+r' id="{link}">{title}</h'+str(i)+r'>').format_map(dictionary)
+
136
            text = text[:match.start()] + replacement + text[match.end():]
+
137
            match = re.search(header, text, flags=re.MULTILINE)
+
138
+
139
    # All headers transformed
+
140
+
141
    # Paragraphs
+
142
    text = re.sub(r"(?P<text>(?:^(?!<).+\n)+)", r"<p>\n\g<text></p>", text, flags=re.MULTILINE)
+
143
+
144
+
145
    # Doing inline hyperlinks
+
146
    text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+) \"(?P<title>.+)\"\)", r'<a href="\g<url>" title="\g<title>">\g<text></a>', text)
+
147
    text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+)\)", r'<a href="\g<url>">\g<text></a>', text)
+
148
+
149
    # Doing emphasis and strongs
+
150
    text = re.sub(r"\*\*(?P<text>[^*.]*)\*\*", r"<strong>\g<text></strong>", text)
+
151
    text = re.sub(r"__(?P<text>[^\_.]*)__", r"<strong>\g<text></strong>", text)
+
152
    text = re.sub(r"\*(?P<text>[^\*.]*)\*", r"<em>\g<text></em>", text)
+
153
    text = re.sub(r"_(?P<text>[^\_.]*)_", r"<em>\g<text></em>", text)
+
154
+
155
    return text
+
156
"""
+
157
+
158
+
159
+
160
+
161
+
162
    block_elements_table = {
+
163
        "code": r"```(?P<language>\w+)\n(    .*\n)+",
+
164
        "blockquote": r"^> (?P<text>.+)
+
165
        "paragraph": r"(?P<text>(^.+\n)+)",
+
166
        "header": r"^#{1,6} (?P<title>(\w+ ?)+ *) ?#*$",
+
167
+
168
+
169
    element_table = {
+
170
        "emphasis": (r"\*(?P<text>[^\*.]*)\*|_(?P<text>[^\_.]*)_", emphasis, emphasis_end),
+
171
        "strong": (r"\*\*(?P<text>[^*.]*)\*\*|__(?P<text>[^\_.]*)__", strong, strong_end),
+
172
        "unordered list": (r"")
+
173
        "inline link": (r"\[(\w\s)+\]\(
+
174
+
175
+
176
    def translate(text, begin, end, parameters):
+
177
+
178
        if alpha:  # If this contains no more nested elements:
+
179
            return begin.format(parameters) + text + end
+
180
        elif beta:  # text contains nested elements:
+
181
            # Find parameters or something IDK
+
182
            return begin.format(parameters) + _
+
183
        translate(text[alpha:beta], begin_tag, end_tag, found_parameters) + _
+
184
        end
+
185
+
186
    # Zoom zoom insert magic code here
68
187
    # Assuming the file is open
69
-
    file_content = file.read()
70
-
71
188
    # NOTE: Hyperlinks are handled specially in Markdown. Check the syntax page
72
189
    # for more information. That said, it's imperative to **first** collect all
73
190
    # information about hyperlinks, and remove it, so it can be used when
74
191
    # parsing hyperlinks.
75
192
76
193
    """ The reason the length is stored instead of the end, is because it is
77
-
    less error prone; if a parent node is updated, only the begin needs to be
+
194
    elements = {
+
195
        paragraph: r"",
+
196
        ordered_list_item: r""
+
197
        hyperlink:
+
198
        header1: r"^# [*\(\n)] \n"
+
199
    }
+
200
+
201
"""
+
202
""" The reason the length is stored instead of the end, is because it is
+
203
    less error prone; if a parent node is updated, only the begin needs to be
78
204
    updated, as the length is still the same for the node. The begin can be
79
205
    relative to the parent node, so even that won't have to be updated. """
80
206
    node = {
+
207
    node = {
81
208
            "type": block_type,
82
209
            "begin": begin,
83
210
            "length": length,
84
211
            "children": children,
85
212
            }
86
213
87
214
    return markdown_code
88
215
+
216

views.py

8 additions and 1 deletion.

View changes Hide changes
1
1
+
2
2
3
from django.shortcuts import get_object_or_404, render # This allows to render the template with the view here. It's pretty cool and important.
3
4
from django.http import HttpResponseRedirect, HttpResponse
4
5
from django.core.urlresolvers import reverse # Why?
5
6
from django.template import loader # This allows to actually load the template.
6
7
from django.contrib.auth.decorators import login_required
7
8
from django.contrib.auth import authenticate, login
8
9
from .models import Post
9
10
from django.core.exceptions import ObjectDoesNotExist
10
11
from markdown import markdown
11
12
from django.utils import translation
12
13
13
14
# FIXME: Remove this template trash. THIS IS A VIEW, NOT A FUCKING TEMPLATE FFS
14
15
context = {
15
16
    'materialDesign_color': "green",
16
17
    'materialDesign_accentColor': "purple",
17
18
    'navbar_title': "Blog",
18
19
    'navbar_fixed': True,
19
20
    'navbar_backArrow': True,
20
21
    #'footer_title': "Maarten's blog",
21
22
    #'footer_description': "My personal scribbly notepad.",
22
23
    #'footer_links': footer_links,
23
24
    }
24
25
25
26
def get_available_post_languages(post):
+
27
    # TODO: This still uses Pandoc to convert the file in the background to HTML
+
28
    # code. That's a pretty bad solution (doesn't mean Pandoc is bad though).
+
29
    # Remember to write a custom implementation when there's time available.
+
30
    return subprocess.check_output(["pandoc", file_path])
+
31
+
32
def get_available_post_languages(post):
26
33
    """ Returns the language codes for which a blog post exists. This function
27
34
    always returns English (because that field mustn't be empty).
28
35
    So say a blog post has an English, Dutch and French version (which means
29
36
    english_file, french_file and dutch_file aren't empty), the function will return {"en",
30
37
    "fr", "nl"}. """
31
38
    available_languages = {"en"}
32
39
    if post.german_file is not none:
33
-
        available_languages.add("de")
+
40
        available_languages.add("de")
34
41
    if post.spanish_file is not None:
35
42
        available_languages.add("es")
36
43
    if post.french_file is not None:
37
44
        available_languages.add("fr")
38
45
    if post.dutch_file is not None:
39
46
        available_languages.add("nl")
40
47
    return available_languages
41
48
42
49
def get_preferred_post_language(post, language):
43
50
    """ Returns the post language file that best suits the given language. This
44
51
    is handy if you know what language the user prefers, but aren't sure whether
45
52
    you can provide that language. This function will try to provide the file
46
53
    for that language, or return English if that's not possible. """
47
54
    if language == "de" and post.german_file is not None:
48
55
        return post.german_file
49
56
    if language == "es" and post.spanish_file is not None:
50
57
        return post.spanish_file
51
58
    if language == "fr" and post.french_file is not None:
52
59
        return post.french_file
53
60
    if language == "nl" and post.dutch_file is not None:
54
61
        return post.dutch_file
55
62
    return post.english_file  # Returned if all other choices wouldn't be satisfactory, or the requested language is English.
56
63
    
57
64
58
65
def index(request):
59
66
    template = "blog/index.html"
60
67
    posts = Post.objects.all()
61
68
    language = translation.get_language()
62
69
63
70
    post_links = []
64
71
    for post in posts:
65
72
        blog_file = get_preferred_post_language(post, language)
66
73
        # TODO: Find a cleaner way to determine the title. First and foremost:
67
74
        # If the language differs from English, the other language file needs to
68
75
        # be loaded. Plus: look for a built in function to remove the full path
69
76
        # and only return the file name.
70
77
        title = (blog_file.name.rpartition("/")[2]).rpartition(".")[0]
71
78
        date = post.published
72
79
        description = "Lorem ipsum"
73
80
        # TODO: The link can possibly be reversed in the DTL using the title, which is actually
74
81
        # a cleaner way to do it. Investigate.
75
82
        link = reverse("blog-post", args=[str(post)])
76
83
        post_links.append([title, date, description, link])
77
84
78
85
    context = {
79
86
            'post_links': post_links,
80
87
            }
81
88
    return render(request, template, context)
82
89
83
90
def post(request, title):
84
91
    template = "blog/post.html"
85
92
    posts = Post.objects.get(english_file=title)
86
93
    language = translation.get_language()
87
94
    blog_file = get_preferred_post_language(post, language)
88
95
    blog_text = markdown(blog_file)
89
96
90
97
    context = {
91
98
        'article': blog_text,
92
99
        'title': blog_file.name,
93
100
        }
94
101
    return render(request, template, context)
95
102