Add more info to markdown.py about implementation
I've been playing with the idea of what would be the best way to write a Markdown 'transpiler' in Python. I've settled for a tree-based solution, where all blocks will be represented in a tree. See markdown.py for more information.
- Author
- Maarten 'Vngngdn' Vangeneugden
- Date
- July 18, 2017, 6:18 p.m.
- Hash
- 62e5f976d88bc53b2dd2fedcff0d02f137c83f7d
- Parent
- d3365599fb04a10e62959e03fc0ee9884a656800
- Modified file
- markdown.py
markdown.py ¶
20 additions and 3 deletions.
View changes Hide changes
1 |
1 |
fucking shit, I've decided to write my own implementation. Contary to the one in |
2 |
2 |
PyPI, my version handles **all** cases, and is a **full implementation** of the |
3 |
3 |
reference. |
4 |
4 |
|
5 |
5 |
Oh, and just so you know: You don't need an entire shitty object oriented system |
6 |
6 |
to make something decent. Sometimes the solution is a function. Period. |
7 |
7 |
""" |
8 |
8 |
|
9 |
9 |
""" |
10 |
10 |
Checklist about all shit that must be implemented: |
11 |
11 |
- headers need to have their ID's be the same as the title. BUT! id's |
12 |
12 |
mustn't have spaces, and need to be unique. The latter isn't that big of a |
13 |
13 |
deal, but spaces in the header title must be converted to dashes. |
14 |
14 |
- HTML code needs to be escaped; & must become &, < and > become < |
15 |
15 |
and > and so on. This isn't necessary for UTF-8 symbols such as ©, |
16 |
16 |
which can be put in place as is, instead of converting to ©. |
17 |
17 |
- Some elements have to be placed in the tag itself, such as links in <a />. |
18 |
18 |
This is noted with the {#} tags. The context in which they are used in the |
19 |
19 |
defaults should give a good explanation on what number points to what. |
20 |
20 |
- Remember to support 2 trailing spaces as <br />! |
21 |
21 |
- There are also "closing ATX headers": "# title" is the same as |
22 |
22 |
"# title ####" and "# title #". (So it's purely cosmetic, remove the |
23 |
23 |
trailing whitespace in these cases) |
24 |
24 |
- When code is used, call Pygments to markup the code properly. If a code |
25 |
25 |
tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well, |
26 |
26 |
so it can do a better job. If nothing is provided, leave it as is. When |
27 |
27 |
it's an inline code block (`CODE`), leave that always as is. |
28 |
28 |
Look how to do it at |
29 |
29 |
<http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>. |
30 |
30 |
""" |
31 |
31 |
|
32 |
32 |
def toHTML( |
33 |
33 |
file, |
34 |
34 |
emphasis = "<em>", |
35 |
35 |
emphasis_end ="</em>", |
36 |
36 |
strong = "<strong>", |
37 |
37 |
strong_end = "</strong>", |
38 |
38 |
unordered_list = "<ul>", |
39 |
39 |
unordered_list_end = "</ul>", |
40 |
40 |
ordered_list = "<ol>", |
41 |
41 |
ordered_list_end = "</ol>", |
42 |
42 |
list_item = "<li>", |
43 |
43 |
list_item_end = "</li>", |
44 |
44 |
hyperlink = "<a href=\"{0}\" title=\"{1}\">", |
45 |
45 |
hyperlink_end = "</a>", |
46 |
46 |
image = "<img src=\"{0}\">", |
47 |
47 |
image_end = "</img>", |
48 |
48 |
paragraph = "<p>", |
49 |
49 |
paragraph_end = "</p>", |
50 |
50 |
blockquote = "<quote>", |
51 |
- | blockquote_end = "</quote>", |
52 |
- | header1 = "<h1 id=\"{0}\">", |
+ |
51 |
blockquote_end = "</blockquote>", |
+ |
52 |
header1 = "<h1 id=\"{0}\">", |
53 |
53 |
header1_end = "</h1>", |
54 |
54 |
header2 = "<h2 id=\"{0}\">", |
55 |
55 |
header2_end = "</h2>", |
56 |
56 |
header3 = "<h3 id=\"{0}\">", |
57 |
57 |
header3_end = "</h3>", |
58 |
58 |
header4 = "<h4 id=\"{0}\">", |
59 |
59 |
header4_end = "</h4>", |
60 |
60 |
header5 = "<h5 id=\"{0}\">", |
61 |
61 |
header5_end = "</h5>", |
62 |
62 |
header6 = "<h6 id=\"{0}\">", |
63 |
63 |
header6_end = "</h6>", |
64 |
64 |
code = "<code>", |
65 |
- | code_end = "</code>", |
+ |
65 |
code_end = "</code>", |
66 |
66 |
): |
67 |
67 |
# Zoom zoom insert magic code here |
68 |
68 |
|
+ |
69 |
file_content = file.read() |
+ |
70 |
|
69 |
71 |
|
+ |
72 |
# for more information. That said, it's imperative to **first** collect all |
+ |
73 |
# information about hyperlinks, and remove it, so it can be used when |
+ |
74 |
# parsing hyperlinks. |
+ |
75 |
|
+ |
76 |
""" The reason the length is stored instead of the end, is because it is |
+ |
77 |
less error prone; if a parent node is updated, only the begin needs to be |
+ |
78 |
updated, as the length is still the same for the node. The begin can be |
+ |
79 |
relative to the parent node, so even that won't have to be updated. """ |
+ |
80 |
node = { |
+ |
81 |
"type": block_type, |
+ |
82 |
"begin": begin, |
+ |
83 |
"length": length, |
+ |
84 |
"children": children, |
+ |
85 |
} |
+ |
86 |
|
70 |
87 |
return markdown_code |
71 |
88 |