Have a minimal working Markdown transpiler
- Author
- Maarten 'Vngngdn' Vangeneugden
- Date
- Aug. 8, 2017, 12:48 a.m.
- Hash
- 772aedaf0099886a369576cc8f8ab9a6860ced2e
- Parent
- 62e5f976d88bc53b2dd2fedcff0d02f137c83f7d
- Modified files
- DEPENDENCIES.md
- markdown.py
- views.py
DEPENDENCIES.md ¶
2 additions and 1 deletion.
View changes Hide changes
1 |
1 |
============ |
2 |
2 |
|
3 |
3 |
Since web apps (and everything web related) is known to suck at dependency handling (which is why we have Bower, NPM, cdnjs, docker, ... (I'll explain sometime somewhere why this is done, and why it's useless), I've decided to take matters in my own hands; I'll just provide one simple markdown file listing all dependencies in human-readable format. I don't care whether this helps or not, but no harm is done, right? |
4 |
4 |
|
5 |
5 |
Legend |
6 |
6 |
------ |
7 |
7 |
|
8 |
8 |
The dependencies themselves have been ranked with a certain grade. I explain what they mean here: |
9 |
9 |
|
10 |
10 |
* A : This dependency is present inside the app. This will be in the form of ready-to-use source code files. As such, they need no further attention. |
11 |
11 |
* B : This dependency is present inside the app. This will be in the form of an archive file (i.e. .tar.gz), OR this needs additional work to function properly. In these cases, I refer to their documentation which will be available at the location of the files. |
12 |
12 |
* C : This dependency does not come with the app. These are often system packages, which often means (if you're on GNU/Linux) to search for it in your repositories. Their documentation is your last resort for these if it doesn't work as it should. |
13 |
13 |
|
14 |
14 |
I hope I never will, but when it's necessary, I'll add a D grade, for unreadable, egregious bogus software that will give you all horrible kinds of diseases if you dare to work with it, but are indispensable to make the web app function. Just be happy I don't use it yet. (For example, PHP qualifies as D, so praise [our lord and saviour](https://rms.sexy) I write Python.) |
15 |
15 |
|
16 |
16 |
A |
17 |
17 |
- |
18 |
18 |
* [Materialize](http://materializecss.com/): The layout framework for the web app. (Material Design is the best) |
19 |
19 |
|
20 |
20 |
|
21 |
21 |
B |
22 |
22 |
- |
23 |
23 |
|
24 |
24 |
C |
25 |
25 |
- |
26 |
26 |
* [Markdown](https://pypi.python.org/pypi/Markdown): Required to compile the |
27 |
- | blog post files to HTML. |
+ |
27 |
blog post files to HTML. |
28 |
28 |
|
+ |
29 |
markdown.py ¶
164 additions and 36 deletions.
View changes Hide changes
+ |
1 |
import pygments |
+ |
2 |
|
+ |
3 |
""" So welcome to my Markdown module. Since the markdown library in PyPI is |
1 |
4 |
fucking shit, I've decided to write my own implementation. Contary to the one in |
2 |
5 |
PyPI, my version handles **all** cases, and is a **full implementation** of the |
3 |
6 |
reference. |
4 |
7 |
|
5 |
8 |
Oh, and just so you know: You don't need an entire shitty object oriented system |
6 |
9 |
to make something decent. Sometimes the solution is a function. Period. |
7 |
10 |
""" |
8 |
11 |
|
9 |
12 |
""" |
10 |
13 |
Checklist about all shit that must be implemented: |
11 |
14 |
- headers need to have their ID's be the same as the title. BUT! id's |
12 |
15 |
mustn't have spaces, and need to be unique. The latter isn't that big of a |
13 |
16 |
deal, but spaces in the header title must be converted to dashes. |
14 |
17 |
- HTML code needs to be escaped; & must become &, < and > become < |
15 |
18 |
and > and so on. This isn't necessary for UTF-8 symbols such as ©, |
16 |
19 |
which can be put in place as is, instead of converting to ©. |
17 |
20 |
- Some elements have to be placed in the tag itself, such as links in <a />. |
18 |
21 |
This is noted with the {#} tags. The context in which they are used in the |
19 |
22 |
defaults should give a good explanation on what number points to what. |
20 |
23 |
- Remember to support 2 trailing spaces as <br />! |
21 |
24 |
- There are also "closing ATX headers": "# title" is the same as |
22 |
25 |
"# title ####" and "# title #". (So it's purely cosmetic, remove the |
23 |
26 |
trailing whitespace in these cases) |
24 |
27 |
- When code is used, call Pygments to markup the code properly. If a code |
25 |
28 |
tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well, |
26 |
29 |
so it can do a better job. If nothing is provided, leave it as is. When |
27 |
30 |
it's an inline code block (`CODE`), leave that always as is. |
28 |
31 |
Look how to do it at |
29 |
32 |
<http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>. |
30 |
33 |
""" |
+ |
34 |
Future expansions: |
+ |
35 |
- Allow nesting of more elements. For example: Headers cannot be nested in |
+ |
36 |
blockquotes, but this is a nice thing to have. |
+ |
37 |
- Allow headers to follow a line wrapping, if the next line is perceded by |
+ |
38 |
the same amount of hashtags (=> same header level). |
+ |
39 |
- Allow the special p "Perseverance porn" stories, about how someone walks 10 miles to work every day, have the effect of normalizing the big disadvantages from society that make people do hard labor that society should not need. |
+ |
40 |
|
+ |
41 |
Marriage |
+ |
42 |
7 August 2017aragraph blockquote style: |
+ |
43 |
https://daringfireball.net/projects/markdown/syntax#blockquote |
+ |
44 |
""" |
31 |
45 |
|
32 |
46 |
def toHTML( |
+ |
47 |
def toHTML( |
33 |
48 |
file, |
34 |
- | emphasis = "<em>", |
35 |
- | emphasis_end ="</em>", |
36 |
- | strong = "<strong>", |
37 |
- | strong_end = "</strong>", |
38 |
- | unordered_list = "<ul>", |
39 |
- | unordered_list_end = "</ul>", |
40 |
- | ordered_list = "<ol>", |
41 |
- | ordered_list_end = "</ol>", |
42 |
- | list_item = "<li>", |
43 |
- | list_item_end = "</li>", |
44 |
- | hyperlink = "<a href=\"{0}\" title=\"{1}\">", |
45 |
- | hyperlink_end = "</a>", |
46 |
- | image = "<img src=\"{0}\">", |
47 |
- | image_end = "</img>", |
48 |
- | paragraph = "<p>", |
49 |
- | paragraph_end = "</p>", |
50 |
- | blockquote = "<blockquote>", |
51 |
- | blockquote_end = "</blockquote>", |
52 |
- | header1 = "<h1 id=\"{0}\">", |
53 |
- | header1_end = "</h1>", |
54 |
- | header2 = "<h2 id=\"{0}\">", |
55 |
- | header2_end = "</h2>", |
56 |
- | header3 = "<h3 id=\"{0}\">", |
57 |
- | header3_end = "</h3>", |
58 |
- | header4 = "<h4 id=\"{0}\">", |
59 |
- | header4_end = "</h4>", |
60 |
- | header5 = "<h5 id=\"{0}\">", |
61 |
- | header5_end = "</h5>", |
62 |
- | header6 = "<h6 id=\"{0}\">", |
63 |
- | header6_end = "</h6>", |
64 |
- | code = "<code lang=\"{0}\">", |
65 |
- | code_end = "</code>", |
66 |
- | ): |
+ |
49 |
emphasis = r"<em>{text}</em>", |
+ |
50 |
strong = r"<strong>{text}</strong>", |
+ |
51 |
unordered_list = r"<ul>{items}</ul>", |
+ |
52 |
ordered_list = r"<ol>{items}</ol>", |
+ |
53 |
list_item = r"<li>{text}</li>", |
+ |
54 |
hyperlink = r'<a href="{link}" title="{title}">{text}</a>', |
+ |
55 |
image = r'<img src="{link}" alt="{alt}" title="{title}" />', |
+ |
56 |
paragraph = r"<p>{text}</p>", |
+ |
57 |
blockquote = r"<blockquote>{text}</blockquote>", |
+ |
58 |
header1 = r'<h1 id="{link}">{text}</h1>', |
+ |
59 |
header2 = r'<h2 id="{link}">{text}</h2>', |
+ |
60 |
header3 = r'<h3 id="{link}">{text}</h3>', |
+ |
61 |
header4 = r'<h4 id="{link}">{text}</h4>', |
+ |
62 |
header5 = r'<h5 id="{link}">{text}</h5>', |
+ |
63 |
header6 = r'<h6 id="{link}">{text}</h6>', |
+ |
64 |
code = r'<code lang="{language}">{code}</code>', |
+ |
65 |
incorrect = r"<s>{text}</s>", |
+ |
66 |
line_break = r"<br />", |
+ |
67 |
horizontal_rule = r"<hr />", |
+ |
68 |
): |
67 |
69 |
# Zoom zoom insert magic code here |
+ |
70 |
|
+ |
71 |
This is a pure function. |
+ |
72 |
|
+ |
73 |
This function will translate given Markdown code to HTML code. |
+ |
74 |
It follows the specification as good as possible, with a few custom additions: |
+ |
75 |
- Incorrect text can be marked with "~" around a text block. |
+ |
76 |
|
+ |
77 |
The default parameters have sane defaults, but can be customized if you wish |
+ |
78 |
to do so. Pay attention to the tags, as your custom value must also |
+ |
79 |
incorporate these. |
+ |
80 |
|
+ |
81 |
The function works in a simple way: |
+ |
82 |
1. Replace all redundant content with only 1 unique part |
+ |
83 |
1.1. For example: 5 blank lines mean the same as 2; a line with only spaces |
+ |
84 |
and tabs means the same as an empty line; hashtags at the end of a header |
+ |
85 |
line are meaningless; ... |
+ |
86 |
2. Handle blockquotes. Blockquotes have the highest precedence and can contain |
+ |
87 |
any other element, thus it's easiest to just handle these as soon as possible. |
+ |
88 |
3. Replace Setext with atx-style headers, to provide consistency for header handling. |
+ |
89 |
4. Handle block elements (paragraphs, code, ...). |
+ |
90 |
5. In all block elements, handle span elements (links, emphasis, ...). |
+ |
91 |
""" |
+ |
92 |
|
+ |
93 |
# Replacing some shit: |
+ |
94 |
text = re.sub(r"^[ \t]+$", "\n", text) # Make all blank lines consistent |
+ |
95 |
text = re.sub(r"\n{3}", "\n\n", text) # Replace redundant blanks with 2 blank lines |
+ |
96 |
|
+ |
97 |
# XXX: Blockquotes have the highest precedence: **ANYTHING** can be nested |
+ |
98 |
# in a blockquote. So, handle these first, and convert them up front to |
+ |
99 |
# make it easier to handle the other text. |
+ |
100 |
|
+ |
101 |
|
+ |
102 |
""" About handling blockquotes: |
+ |
103 |
Every line that starts with "> " is a blockquote. As long as the next line |
+ |
104 |
starts in the same way, it's considered part of the same blockquote. |
+ |
105 |
**However**, there is 1 exception to this rule: |
+ |
106 |
paragraphs that are hard-wrapped only need 1 > for their first line, but can |
+ |
107 |
then be hard wrapped, and even start without prior spacing. |
+ |
108 |
""" |
+ |
109 |
blockquotes_left = True |
+ |
110 |
while blockquotes_left: |
+ |
111 |
blockquote = re.compile(r"(^> .+\n)+") |
+ |
112 |
quote = blockquote.search(text) |
+ |
113 |
if quote is None: |
+ |
114 |
blockquotes_left = False |
+ |
115 |
else: |
+ |
116 |
begin, end = quote.span() |
+ |
117 |
reworked = "<blockquote>" + text[begin:end].replace(r"\n> ", r"\n") + r"</blockquote>\n" |
+ |
118 |
text = text[:begin] + reworked + text[end:] |
+ |
119 |
|
+ |
120 |
# All blockquotes are now removed |
+ |
121 |
|
+ |
122 |
# Converting setext to atx headers |
+ |
123 |
text = re.sub(r"^(?P<title>.+)\n=+$", r"# \g<title>", text, flags=re.MULTILINE) |
+ |
124 |
text = re.sub(r"^(?P<title>.+)\n-+$", r"## \g<title>", text, flags=re.MULTILINE) |
+ |
125 |
# All are now converted to atx style headers |
+ |
126 |
# Transforming headers: |
+ |
127 |
for i in range(1,7): |
+ |
128 |
header = r"^#{"+str(i)+r"} (?P<title>.+)$" |
+ |
129 |
match = re.search(header, text, flags=re.MULTILINE) |
+ |
130 |
while match is not None: |
+ |
131 |
future_id = match['title'].lower() |
+ |
132 |
future_id = re.sub(r"[ _,.!]", r"-", future_id) |
+ |
133 |
dictionary = match.groupdict() |
+ |
134 |
dictionary['link'] = future_id |
+ |
135 |
replacement = (r'<h'+str(i)+r' id="{link}">{title}</h'+str(i)+r'>').format_map(dictionary) |
+ |
136 |
text = text[:match.start()] + replacement + text[match.end():] |
+ |
137 |
match = re.search(header, text, flags=re.MULTILINE) |
+ |
138 |
|
+ |
139 |
# All headers transformed |
+ |
140 |
|
+ |
141 |
# Paragraphs |
+ |
142 |
text = re.sub(r"(?P<text>(?:^(?!<).+\n)+)", r"<p>\n\g<text></p>", text, flags=re.MULTILINE) |
+ |
143 |
|
+ |
144 |
|
+ |
145 |
# Doing inline hyperlinks |
+ |
146 |
text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+) \"(?P<title>.+)\"\)", r'<a href="\g<url>" title="\g<title>">\g<text></a>', text) |
+ |
147 |
text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+)\)", r'<a href="\g<url>">\g<text></a>', text) |
+ |
148 |
|
+ |
149 |
# Doing emphasis and strongs |
+ |
150 |
text = re.sub(r"\*\*(?P<text>[^*.]*)\*\*", r"<strong>\g<text></strong>", text) |
+ |
151 |
text = re.sub(r"__(?P<text>[^\_.]*)__", r"<strong>\g<text></strong>", text) |
+ |
152 |
text = re.sub(r"\*(?P<text>[^\*.]*)\*", r"<em>\g<text></em>", text) |
+ |
153 |
text = re.sub(r"_(?P<text>[^\_.]*)_", r"<em>\g<text></em>", text) |
+ |
154 |
|
+ |
155 |
return text |
+ |
156 |
""" |
+ |
157 |
|
+ |
158 |
|
+ |
159 |
|
+ |
160 |
|
+ |
161 |
|
+ |
162 |
block_elements_table = { |
+ |
163 |
"code": r"```(?P<language>\w+)\n( .*\n)+", |
+ |
164 |
"blockquote": r"^> (?P<text>.+) |
+ |
165 |
"paragraph": r"(?P<text>(^.+\n)+)", |
+ |
166 |
"header": r"^#{1,6} (?P<title>(\w+ ?)+ *) ?#*$", |
+ |
167 |
|
+ |
168 |
|
+ |
169 |
element_table = { |
+ |
170 |
"emphasis": (r"\*(?P<text>[^\*.]*)\*|_(?P<text>[^\_.]*)_", emphasis, emphasis_end), |
+ |
171 |
"strong": (r"\*\*(?P<text>[^*.]*)\*\*|__(?P<text>[^\_.]*)__", strong, strong_end), |
+ |
172 |
"unordered list": (r"") |
+ |
173 |
"inline link": (r"\[(\w\s)+\]\( |
+ |
174 |
|
+ |
175 |
|
+ |
176 |
def translate(text, begin, end, parameters): |
+ |
177 |
|
+ |
178 |
if alpha: # If this contains no more nested elements: |
+ |
179 |
return begin.format(parameters) + text + end |
+ |
180 |
elif beta: # text contains nested elements: |
+ |
181 |
# Find parameters or something IDK |
+ |
182 |
return begin.format(parameters) + _ |
+ |
183 |
translate(text[alpha:beta], begin_tag, end_tag, found_parameters) + _ |
+ |
184 |
end |
+ |
185 |
|
+ |
186 |
# Zoom zoom insert magic code here |
68 |
187 |
# Assuming the file is open |
69 |
- | file_content = file.read() |
70 |
- | |
71 |
188 |
# NOTE: Hyperlinks are handled specially in Markdown. Check the syntax page |
72 |
189 |
# for more information. That said, it's imperative to **first** collect all |
73 |
190 |
# information about hyperlinks, and remove it, so it can be used when |
74 |
191 |
# parsing hyperlinks. |
75 |
192 |
|
76 |
193 |
""" The reason the length is stored instead of the end, is because it is |
77 |
- | less error prone; if a parent node is updated, only the begin needs to be |
+ |
194 |
elements = { |
+ |
195 |
paragraph: r"", |
+ |
196 |
ordered_list_item: r"" |
+ |
197 |
hyperlink: |
+ |
198 |
header1: r"^# [*\(\n)] \n" |
+ |
199 |
} |
+ |
200 |
|
+ |
201 |
""" |
+ |
202 |
""" The reason the length is stored instead of the end, is because it is |
+ |
203 |
less error prone; if a parent node is updated, only the begin needs to be |
78 |
204 |
updated, as the length is still the same for the node. The begin can be |
79 |
205 |
relative to the parent node, so even that won't have to be updated. """ |
80 |
206 |
node = { |
+ |
207 |
node = { |
81 |
208 |
"type": block_type, |
82 |
209 |
"begin": begin, |
83 |
210 |
"length": length, |
84 |
211 |
"children": children, |
85 |
212 |
} |
86 |
213 |
|
87 |
214 |
return markdown_code |
88 |
215 |
|
+ |
216 |
views.py ¶
8 additions and 1 deletion.
View changes Hide changes
1 |
1 |
|
+ |
2 |
|
2 |
3 |
from django.shortcuts import get_object_or_404, render # This allows to render the template with the view here. It's pretty cool and important. |
3 |
4 |
from django.http import HttpResponseRedirect, HttpResponse |
4 |
5 |
from django.core.urlresolvers import reverse # Why? |
5 |
6 |
from django.template import loader # This allows to actually load the template. |
6 |
7 |
from django.contrib.auth.decorators import login_required |
7 |
8 |
from django.contrib.auth import authenticate, login |
8 |
9 |
from .models import Post |
9 |
10 |
from django.core.exceptions import ObjectDoesNotExist |
10 |
11 |
from markdown import markdown |
11 |
12 |
from django.utils import translation |
12 |
13 |
|
13 |
14 |
# FIXME: Remove this template trash. THIS IS A VIEW, NOT A FUCKING TEMPLATE FFS |
14 |
15 |
context = { |
15 |
16 |
'materialDesign_color': "green", |
16 |
17 |
'materialDesign_accentColor': "purple", |
17 |
18 |
'navbar_title': "Blog", |
18 |
19 |
'navbar_fixed': True, |
19 |
20 |
'navbar_backArrow': True, |
20 |
21 |
#'footer_title': "Maarten's blog", |
21 |
22 |
#'footer_description': "My personal scribbly notepad.", |
22 |
23 |
#'footer_links': footer_links, |
23 |
24 |
} |
24 |
25 |
|
25 |
26 |
def get_available_post_languages(post): |
+ |
27 |
# TODO: This still uses Pandoc to convert the file in the background to HTML |
+ |
28 |
# code. That's a pretty bad solution (doesn't mean Pandoc is bad though). |
+ |
29 |
# Remember to write a custom implementation when there's time available. |
+ |
30 |
return subprocess.check_output(["pandoc", file_path]) |
+ |
31 |
|
+ |
32 |
def get_available_post_languages(post): |
26 |
33 |
""" Returns the language codes for which a blog post exists. This function |
27 |
34 |
always returns English (because that field mustn't be empty). |
28 |
35 |
So say a blog post has an English, Dutch and French version (which means |
29 |
36 |
english_file, french_file and dutch_file aren't empty), the function will return {"en", |
30 |
37 |
"fr", "nl"}. """ |
31 |
38 |
available_languages = {"en"} |
32 |
39 |
if post.german_file is not none: |
33 |
- | available_languages.add("de") |
+ |
40 |
available_languages.add("de") |
34 |
41 |
if post.spanish_file is not None: |
35 |
42 |
available_languages.add("es") |
36 |
43 |
if post.french_file is not None: |
37 |
44 |
available_languages.add("fr") |
38 |
45 |
if post.dutch_file is not None: |
39 |
46 |
available_languages.add("nl") |
40 |
47 |
return available_languages |
41 |
48 |
|
42 |
49 |
def get_preferred_post_language(post, language): |
43 |
50 |
""" Returns the post language file that best suits the given language. This |
44 |
51 |
is handy if you know what language the user prefers, but aren't sure whether |
45 |
52 |
you can provide that language. This function will try to provide the file |
46 |
53 |
for that language, or return English if that's not possible. """ |
47 |
54 |
if language == "de" and post.german_file is not None: |
48 |
55 |
return post.german_file |
49 |
56 |
if language == "es" and post.spanish_file is not None: |
50 |
57 |
return post.spanish_file |
51 |
58 |
if language == "fr" and post.french_file is not None: |
52 |
59 |
return post.french_file |
53 |
60 |
if language == "nl" and post.dutch_file is not None: |
54 |
61 |
return post.dutch_file |
55 |
62 |
return post.english_file # Returned if all other choices wouldn't be satisfactory, or the requested language is English. |
56 |
63 |
|
57 |
64 |
|
58 |
65 |
def index(request): |
59 |
66 |
template = "blog/index.html" |
60 |
67 |
posts = Post.objects.all() |
61 |
68 |
language = translation.get_language() |
62 |
69 |
|
63 |
70 |
post_links = [] |
64 |
71 |
for post in posts: |
65 |
72 |
blog_file = get_preferred_post_language(post, language) |
66 |
73 |
# TODO: Find a cleaner way to determine the title. First and foremost: |
67 |
74 |
# If the language differs from English, the other language file needs to |
68 |
75 |
# be loaded. Plus: look for a built in function to remove the full path |
69 |
76 |
# and only return the file name. |
70 |
77 |
title = (blog_file.name.rpartition("/")[2]).rpartition(".")[0] |
71 |
78 |
date = post.published |
72 |
79 |
description = "Lorem ipsum" |
73 |
80 |
# TODO: The link can possibly be reversed in the DTL using the title, which is actually |
74 |
81 |
# a cleaner way to do it. Investigate. |
75 |
82 |
link = reverse("blog-post", args=[str(post)]) |
76 |
83 |
post_links.append([title, date, description, link]) |
77 |
84 |
|
78 |
85 |
context = { |
79 |
86 |
'post_links': post_links, |
80 |
87 |
} |
81 |
88 |
return render(request, template, context) |
82 |
89 |
|
83 |
90 |
def post(request, title): |
84 |
91 |
template = "blog/post.html" |
85 |
92 |
posts = Post.objects.get(english_file=title) |
86 |
93 |
language = translation.get_language() |
87 |
94 |
blog_file = get_preferred_post_language(post, language) |
88 |
95 |
blog_text = markdown(blog_file) |
89 |
96 |
|
90 |
97 |
context = { |
91 |
98 |
'article': blog_text, |
92 |
99 |
'title': blog_file.name, |
93 |
100 |
} |
94 |
101 |
return render(request, template, context) |
95 |
102 |