blog | Gitar

New attempt to make markdown converter work

And that's eum... Pretty hard. I don't know why, but I can only imagine my regexes become so fucking ugly, that I can't do this properly in Python without creating a burden of problems and bugs. Putting this in the commit for historic reference.

Author: Maarten 'Vngngdn' Vangeneugden
Date: Sept. 28, 2017, 11:50 p.m.
Hash: 5a08d014792d6a8417ef684c8e0a22eb6e6ac351
Parent: 772aedaf0099886a369576cc8f8ab9a6860ced2e
Modified files: markdown.py; models.py

markdown.py ¶

22 additions and 15 deletions.

View changes Hide changes

import pygments

""" So welcome to my Markdown module. Since the markdown library in PyPI is
fucking shit, I've decided to write my own implementation. Contary to the one in
PyPI, my version handles **all** cases, and is a **full implementation** of the
reference.

Oh, and just so you know: You don't need an entire shitty object oriented system
to make something decent. Sometimes the solution is a function. Period.
"""

"""
Checklist about all shit that must be implemented:
    - headers need to have their ID's be the same as the title. BUT! id's
      mustn't have spaces, and need to be unique. The latter isn't that big of a
      deal, but spaces in the header title must be converted to dashes.
    - HTML code needs to be escaped; & must become &amp;, < and > become &lt;
      and &gt; and so on. This isn't necessary for UTF-8 symbols such as ©,
      which can be put in place as is, instead of converting to &copy;.
    - Some elements have to be placed in the tag itself, such as links in <a />.
      This is noted with the {#} tags. The context in which they are used in the
      defaults should give a good explanation on what number points to what.
    - Remember to support 2 trailing spaces as <br />!
    - There are also "closing ATX headers": "# title" is the same as 
      "# title ####" and "# title #". (So it's purely cosmetic, remove the
      trailing whitespace in these cases)
    - When code is used, call Pygments to markup the code properly. If a code
      tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
      so it can do a better job. If nothing is provided, leave it as is. When
      it's an inline code block (`CODE`), leave that always as is.
      Look how to do it at
      <http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.

Future expansions:
    - Allow nesting of more elements. For example: Headers cannot be nested in
      blockquotes, but this is a nice thing to have.
    - Allow headers to follow a line wrapping, if the next line is perceded by
      the same amount of hashtags (=> same header level).
    - Allow the special p "Perseverance porn" stories, about how someone walks 10 miles to work every day, have the effect of normalizing the big disadvantages from society that make people do hard labor that society should not need.

Marriage
7 August 2017aragraph blockquote style:
      https://daringfireball.net/projects/markdown/syntax#blockquote
    """


def toHTML(
        text,
        emphasis = r"<em>{text}</em>",
        strong = r"<strong>{text}</strong>",
        unordered_list = r"<ul>{items}</ul>",
        ordered_list = r"<ol>{items}</ol>",
        list_item = r"<li>{text}</li>",
        hyperlink = r'<a href="{link}" title="{title}">{text}</a>',
        image = r'<img src="{link}" alt="{alt}" title="{title}" />',
        paragraph = r"<p>{text}</p>",
        blockquote = r"<blockquote>{text}</blockquote>",
        header1 = r'<h1 id="{link}">{text}</h1>',
        header2 = r'<h2 id="{link}">{text}</h2>',
        header3 = r'<h3 id="{link}">{text}</h3>',
        header4 = r'<h4 id="{link}">{text}</h4>',
        header5 = r'<h5 id="{link}">{text}</h5>',
        header6 = r'<h6 id="{link}">{text}</h6>',
        code = r'<code lang="{language}">{code}</code>',
        incorrect = r"<s>{text}</s>",
        code = r'<code lang="\g<language>">\g<code></code>',
        incorrect = r"<s>{text}</s>",
        line_break = r"<br />",
        horizontal_rule = r"<hr />",
        ):
    """ Translates Markdown code to HTML code.

    This is a pure function.

    This function will translate given Markdown code to HTML code.
    It follows the specification as good as possible, with a few custom additions:
    - Incorrect text can be marked with "~" around a text block.

    The default parameters have sane defaults, but can be customized if you wish
    to do so. Pay attention to the tags, as your custom value must also
    incorporate these.

    The function works in a simple way:
    1. Replace all redundant content with only 1 unique part
    1.1. For example: 5 blank lines mean the same as 2; a line with only spaces
         and tabs means the same as an empty line; hashtags at the end of a header
         line are meaningless; ...
    2. Handle blockquotes. Blockquotes have the highest precedence and can contain
       any other element, thus it's easiest to just handle these as soon as possible.
    3. Replace Setext with atx-style headers, to provide consistency for header handling.
    4. Handle block elements (paragraphs, code, ...).
    5. In all block elements, handle span elements (links, emphasis, ...).
    """

    # Replacing some shit:
    text = re.sub(r"^[ \t]+$", "\n", text)  # Make all blank lines consistent
    text = re.sub(r"\n{3}", "\n\n", text)  # Replace redundant blanks with 2 blank lines

    # XXX: Blockquotes have the highest precedence: **ANYTHING** can be nested
    # in a blockquote. So, handle these first, and convert them up front to
    # make it easier to handle the other text.


    """ About handling blockquotes:
    Every line that starts with "> " is a blockquote. As long as the next line
    starts in the same way, it's considered part of the same blockquote.
    **However**, there is 1 exception to this rule:
    paragraphs that are hard-wrapped only need 1 > for their first line, but can
    then be hard wrapped, and even start without prior spacing.
    """
    blockquotes_left = True
    while blockquotes_left:
         blockquote = re.compile(r"(^> .+\n)+")
         quote = blockquote.search(text)
         if quote is None:
             blockquotes_left = False
         else:
             begin, end = quote.span()
             reworked = "<blockquote>" + text[begin:end].replace(r"\n> ", r"\n") + r"</blockquote>\n"
             text = text[:begin] + reworked + text[end:]

    # All blockquotes are now removed

    # Converting setext to atx headers
    text = re.sub(r"^(?P<title>.+)\n=+$", r"# \g<title>", text, flags=re.MULTILINE)
    text = re.sub(r"^(?P<title>.+)\n-+$", r"## \g<title>", text, flags=re.MULTILINE)
    # All are now converted to atx style headers
    # Transforming headers:
    for i in range(1,7):
        header = r"^#{"+str(i)+r"} (?P<title>.+)$"
        match = re.search(header, text, flags=re.MULTILINE)
        while match is not None:
            future_id = match['title'].lower()
            future_id = re.sub(r"[ _,.!]", r"-", future_id)
            dictionary = match.groupdict()
            future_id = re.sub(r" ", r"-", future_id)
            dictionary = match.groupdict()
            dictionary['link'] = future_id
            replacement = (r'<h'+str(i)+r' id="{link}">{title}</h'+str(i)+r'>').format_map(dictionary)
            text = text[:match.start()] + replacement + text[match.end():]
            match = re.search(header, text, flags=re.MULTILINE)

    # All headers transformed

    # Paragraphs
    text = re.sub(r"(?P<text>(?:^(?!<).+\n)+)", r"<p>\n\g<text></p>", text, flags=re.MULTILINE)


    # Doing inline hyperlinks
    text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+) \"(?P<title>.+)\"\)", r'<a href="\g<url>" title="\g<title>">\g<text></a>', text)
    text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+)\)", r'<a href="\g<url>">\g<text></a>', text)

    # Doing emphasis and strongs
    text = re.sub(r"\*\*(?P<text>[^*.]*)\*\*", r"<strong>\g<text></strong>", text)
    text = re.sub(r"__(?P<text>[^\_.]*)__", r"<strong>\g<text></strong>", text)
    text = re.sub(r"\*(?P<text>[^\*.]*)\*", r"<em>\g<text></em>", text)
    text = re.sub(r"_(?P<text>[^\_.]*)_", r"<em>\g<text></em>", text)

    text = re.sub(r"\[(?P<text>.+?)\]\((?P<url>.+?)\)", r'<a href="\g<url>">\g<text></a>', text, flags=re.S)

    # Doing strongs
    text = re.sub(r"\*\*(?P<text>.+?)\*\*", r"<strong>\g<text></strong>", text, flags=re.S)
    text = re.sub(r"__(?P<text>.+?)__", r"<strong>\g<text></strong>", text, flags=re.S)
    # Doing emphasis
    text = re.sub(r"\*(?P<text>.+?)\*", r"<em>\g<text></em>", text, flags=re.S)
    text = re.sub(r"_(?P<text>.+?)_", r"<em>\g<text></em>", text, flags=re.S)
    # Code blocks
    text = re.sub(r"^```(?P<language>.+?)\n(?P<code>.+?)\n```$", code, text, flags=re.S)
    # Doing inline code
    text = re.sub(r"``(?P<code>.+?)``", inline_code, text, flags=re.S)
    text = re.sub(r"`(?P<code>.+?)`", inline_code, text, flags=re.S)
    # Header lines
    text = re.sub(r"^((\*|_|-) *){3,}$", horizontal_rule, text)
    # Line breaks
    text = re.sub(r"  $", line_break, text)

    return text
"""





    block_elements_table = {
        "code": r"```(?P<language>\w+)\n(    .*\n)+",
        "blockquote": r"^> (?P<text>.+)
        "paragraph": r"(?P<text>(^.+\n)+)",
        "header": r"^#{1,6} (?P<title>(\w+ ?)+ *) ?#*$",


    element_table = {
        "emphasis": (r"\*(?P<text>[^\*.]*)\*|_(?P<text>[^\_.]*)_", emphasis, emphasis_end),
        "strong": (r"\*\*(?P<text>[^*.]*)\*\*|__(?P<text>[^\_.]*)__", strong, strong_end),
        "unordered list": (r"")
        "inline link": (r"\[(\w\s)+\]\(


    def translate(text, begin, end, parameters):

        if alpha:  # If this contains no more nested elements:
            return begin.format(parameters) + text + end
        elif beta:  # text contains nested elements:
            # Find parameters or something IDK
            return begin.format(parameters) + _
        translate(text[alpha:beta], begin_tag, end_tag, found_parameters) + _
        end

    # Zoom zoom insert magic code here

    # NOTE: Hyperlinks are handled specially in Markdown. Check the syntax page
    # for more information. That said, it's imperative to **first** collect all
    # information about hyperlinks, and remove it, so it can be used when
    # parsing hyperlinks.

    # Table of all elements and their respective regular expression:
    elements = {
        paragraph: r"",
        ordered_list_item: r""
        hyperlink:
        header1: r"^# [*\(\n)] \n"
    }

"""
""" The reason the length is stored instead of the end, is because it is
    less error prone; if a parent node is updated, only the begin needs to be
    updated, as the length is still the same for the node. The begin can be
    relative to the parent node, so even that won't have to be updated. """
"""
    node = {
            "type": block_type,
            "begin": begin,
            "length": length,
            "children": children,
            }

    return markdown_code
"""


models.py ¶

21 additions and 0 deletions.

View changes Hide changes

from django.db import models
import datetime
import os

def post_title_directory(instance, filename):
    """ Files will be uploaded to MEDIA_ROOT/blog/<year of publishing>/<blog
    title>
    The blog title is determined by the text before the first period (".") in
    the filename. So if the file has the name "Trains are bæ.en.md", the file
    will be stored in "blog/<this year>/Trains are bæ". Name your files
    properly!
    It should also be noted that all files are stored in the same folder if they
    belong to the same blogpost, regardless of language. The titles that are
    displayed to the user however, should be the titles of the files themselves,
    which should be in the native language. So if a blog post is titled
    "Universities of Belgium", its Dutch counterpart should be titled
    "Universiteiten van België", so the correct title can be derived from the
    filename.

    Recommended way to name the uploaded file: "<name of blog post in language
    it's written>.md". This removes the maximum amount of redundancy (e.g. the
    language of the file can be derived from the title, no ".fr.md" or something
    like that necessary), and can directly be used for the end user (the title
    is what should be displayed).
    """
    english_file_name = os.path.basename(instance.english_file.name) # TODO: Test if this returns the file name!
    english_title = english_file_name.rpartition(".")[0] 
    year = datetime.date.today().year

    return "blog/{0}/{1}/{2}".format(year, english_title, filename)

class Post(models.Model):
    """ Represents a blog post. The title of the blog post is determnined by the name
    of the files.
    A blog post can be in 5 different languages: German, Spanish, English, French,
    and Dutch. For all these languages, a seperate field exists. Thus, a
    translated blog post has a seperate file for each translation, and is
    seperated from Django's internationalization/localization system.
    Only the English field is mandatory. The others may contain a value if a
    translated version exists, which will be displayed accordingly.
    """
    published = models.DateTimeField(auto_now_add=True)
    english_file = models.FileField(upload_to=post_title_directory, unique=True, blank=False)
    dutch_file = models.FileField(upload_to=post_title_directory, blank=True)
    french_file = models.FileField(upload_to=post_title_directory, blank=True)
    german_file = models.FileField(upload_to=post_title_directory, blank=True)
    spanish_file = models.FileField(upload_to=post_title_directory, blank=True)
    # Only the English file can be unique, because apparantly, there can't be
    # two blank fields in a unique column. Okay then.

    def __str__(self):
        return os.path.basename(self.english_file.name).rpartition(".")[0]

class Comment(models.model):
    """ Represents a comment on a blog post.

    Comments are not linked to an account or anything, I'm trusting the
    commenter that he is honest with his credentials. That being said:
    XXX: Remember to put up a notification that comments are not checked for
    identity, and, unless verified by a trustworthy source, cannot be seen as
    being an actual statement from the commenter.
    Comments are linked to a blogpost, and are not filtered by language. (So a
    comment made by someone reading the article in Dutch, that's written in
    Dutch, will show up (unedited) for somebody whom's reading the Spanish
    version.
    XXX: Remember to notify (tiny footnote or something) that comments showing
    up in a foreign language is by design, and not a bug.
    """
    date = models.DateTimeField(auto_now_add=True)
    name = models.TextField()
    mail = models.EmailField()
    post = models.ForeignKey(Post) # TODO: Finish this class and the shit...



1	1	import pygments
2	2
3	3	""" So welcome to my Markdown module. Since the markdown library in PyPI is
4	4	fucking shit, I've decided to write my own implementation. Contary to the one in
5	5	PyPI, my version handles all cases, and is a full implementation of the
6	6	reference.
7	7
8	8	Oh, and just so you know: You don't need an entire shitty object oriented system
9	9	to make something decent. Sometimes the solution is a function. Period.
10	10	"""
11	11
12	12	"""
13	13	Checklist about all shit that must be implemented:
14	14	- headers need to have their ID's be the same as the title. BUT! id's
15	15	mustn't have spaces, and need to be unique. The latter isn't that big of a
16	16	deal, but spaces in the header title must be converted to dashes.
17	17	- HTML code needs to be escaped; & must become &, < and > become <
18	18	and > and so on. This isn't necessary for UTF-8 symbols such as ©,
19	19	which can be put in place as is, instead of converting to ©.
20	20	- Some elements have to be placed in the tag itself, such as links in <a />.
21	21	This is noted with the {#} tags. The context in which they are used in the
22	22	defaults should give a good explanation on what number points to what.
23	23	- Remember to support 2 trailing spaces as <br />!
24	24	- There are also "closing ATX headers": "# title" is the same as
25	25	"# title ####" and "# title #". (So it's purely cosmetic, remove the
26	26	trailing whitespace in these cases)
27	27	- When code is used, call Pygments to markup the code properly. If a code
28	28	tag is provided (e.g. "Python", "C", ...), tell that to Pygments as well,
29	29	so it can do a better job. If nothing is provided, leave it as is. When
30	30	it's an inline code block (`CODE`), leave that always as is.
31	31	Look how to do it at
32	32	<http://docs.getpelican.com/en/stable/content.html#syntax-highlighting>.
33	33
34	34	Future expansions:
35	35	- Allow nesting of more elements. For example: Headers cannot be nested in
36	36	blockquotes, but this is a nice thing to have.
37	37	- Allow headers to follow a line wrapping, if the next line is perceded by
38	38	the same amount of hashtags (=> same header level).
39	39	- Allow the special p "Perseverance porn" stories, about how someone walks 10 miles to work every day, have the effect of normalizing the big disadvantages from society that make people do hard labor that society should not need.
40	-
41	-	Marriage
42	-	7 August 2017aragraph blockquote style:
43	-	https://daringfireball.net/projects/markdown/syntax#blockquote
44	40	"""
45	41
46	42
47	-	def toHTML(
48	43	text,
49	44	emphasis = r"<em>{text}</em>",
50	45	strong = r"<strong>{text}</strong>",
51	46	unordered_list = r"<ul>{items}</ul>",
52	47	ordered_list = r"<ol>{items}</ol>",
53	48	list_item = r"<li>{text}</li>",
54	49	hyperlink = r'<a href="{link}" title="{title}">{text}</a>',
55	50	image = r'<img src="{link}" alt="{alt}" title="{title}" />',
56	51	paragraph = r"<p>{text}</p>",
57	52	blockquote = r"<blockquote>{text}</blockquote>",
58	53	header1 = r'<h1 id="{link}">{text}</h1>',
59	54	header2 = r'<h2 id="{link}">{text}</h2>',
60	55	header3 = r'<h3 id="{link}">{text}</h3>',
61	56	header4 = r'<h4 id="{link}">{text}</h4>',
62	57	header5 = r'<h5 id="{link}">{text}</h5>',
63	58	header6 = r'<h6 id="{link}">{text}</h6>',
64	59	code = r'<code lang="{language}">{code}</code>',
65	-	incorrect = r"<s>{text}</s>",
+	60	code = r'<code lang="\g<language>">\g<code></code>',
+	61	incorrect = r"<s>{text}</s>",
66	62	line_break = r"<br />",
67	63	horizontal_rule = r"<hr />",
68	64	):
69	65	""" Translates Markdown code to HTML code.
70	66
71	67	This is a pure function.
72	68
73	69	This function will translate given Markdown code to HTML code.
74	70	It follows the specification as good as possible, with a few custom additions:
75	71	- Incorrect text can be marked with "~" around a text block.
76	72
77	73	The default parameters have sane defaults, but can be customized if you wish
78	74	to do so. Pay attention to the tags, as your custom value must also
79	75	incorporate these.
80	76
81	77	The function works in a simple way:
82	78	1. Replace all redundant content with only 1 unique part
83	79	1.1. For example: 5 blank lines mean the same as 2; a line with only spaces
84	80	and tabs means the same as an empty line; hashtags at the end of a header
85	81	line are meaningless; ...
86	82	2. Handle blockquotes. Blockquotes have the highest precedence and can contain
87	83	any other element, thus it's easiest to just handle these as soon as possible.
88	84	3. Replace Setext with atx-style headers, to provide consistency for header handling.
89	85	4. Handle block elements (paragraphs, code, ...).
90	86	5. In all block elements, handle span elements (links, emphasis, ...).
91	87	"""
92	88
93	89	# Replacing some shit:
94	90	text = re.sub(r"^[ \t]+$", "\n", text) # Make all blank lines consistent
95	91	text = re.sub(r"\n{3}", "\n\n", text) # Replace redundant blanks with 2 blank lines
96	92
97	93	# XXX: Blockquotes have the highest precedence: ANYTHING can be nested
98	94	# in a blockquote. So, handle these first, and convert them up front to
99	95	# make it easier to handle the other text.
100	96
101	97
102	98	""" About handling blockquotes:
103	99	Every line that starts with "> " is a blockquote. As long as the next line
104	100	starts in the same way, it's considered part of the same blockquote.
105	101	However, there is 1 exception to this rule:
106	102	paragraphs that are hard-wrapped only need 1 > for their first line, but can
107	103	then be hard wrapped, and even start without prior spacing.
108	104	"""
109	105	blockquotes_left = True
110	106	while blockquotes_left:
111	107	blockquote = re.compile(r"(^> .+\n)+")
112	108	quote = blockquote.search(text)
113	109	if quote is None:
114	110	blockquotes_left = False
115	111	else:
116	112	begin, end = quote.span()
117	113	reworked = "<blockquote>" + text[begin:end].replace(r"\n> ", r"\n") + r"</blockquote>\n"
118	114	text = text[:begin] + reworked + text[end:]
119	115
120	116	# All blockquotes are now removed
121	117
122	118	# Converting setext to atx headers
123	119	text = re.sub(r"^(?P<title>.+)\n=+$", r"# \g<title>", text, flags=re.MULTILINE)
124	120	text = re.sub(r"^(?P<title>.+)\n-+$", r"## \g<title>", text, flags=re.MULTILINE)
125	121	# All are now converted to atx style headers
126	122	# Transforming headers:
127	123	for i in range(1,7):
128	124	header = r"^#{"+str(i)+r"} (?P<title>.+)$"
129	125	match = re.search(header, text, flags=re.MULTILINE)
130	126	while match is not None:
131	127	future_id = match['title'].lower()
132	128	future_id = re.sub(r"[ _,.!]", r"-", future_id)
133	-	dictionary = match.groupdict()
+	129	future_id = re.sub(r" ", r"-", future_id)
+	130	dictionary = match.groupdict()
134	131	dictionary['link'] = future_id
135	132	replacement = (r'<h'+str(i)+r' id="{link}">{title}</h'+str(i)+r'>').format_map(dictionary)
136	133	text = text[:match.start()] + replacement + text[match.end():]
137	134	match = re.search(header, text, flags=re.MULTILINE)
138	135
139	136	# All headers transformed
140	137
141	138	# Paragraphs
142	139	text = re.sub(r"(?P<text>(?:^(?!<).+\n)+)", r"<p>\n\g<text></p>", text, flags=re.MULTILINE)
143	140
144	141
145	142	# Doing inline hyperlinks
146	143	text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+) \"(?P<title>.+)\"\)", r'<a href="\g<url>" title="\g<title>">\g<text></a>', text)
147	-	text = re.sub(r"\[(?P<text>.+)\]\((?P<url>.+)\)", r'<a href="\g<url>">\g<text></a>', text)
148	-
149	-	# Doing emphasis and strongs
150	-	text = re.sub(r"\\(?P<text>[^.])\\", r"<strong>\g<text></strong>", text)
151	-	text = re.sub(r"__(?P<text>[^\_.]*)__", r"<strong>\g<text></strong>", text)
152	-	text = re.sub(r"\(?P<text>[^\.])\", r"<em>\g<text></em>", text)
153	-	text = re.sub(r"_(?P<text>[^\_.]*)_", r"<em>\g<text></em>", text)
154	-
+	144	text = re.sub(r"\[(?P<text>.+?)\]\((?P<url>.+?)\)", r'<a href="\g<url>">\g<text></a>', text, flags=re.S)
+	145
+	146	# Doing strongs
+	147	text = re.sub(r"\\(?P<text>.+?)\\", r"<strong>\g<text></strong>", text, flags=re.S)
+	148	text = re.sub(r"__(?P<text>.+?)__", r"<strong>\g<text></strong>", text, flags=re.S)
+	149	# Doing emphasis
+	150	text = re.sub(r"\(?P<text>.+?)\", r"<em>\g<text></em>", text, flags=re.S)
+	151	text = re.sub(r"_(?P<text>.+?)_", r"<em>\g<text></em>", text, flags=re.S)
+	152	# Code blocks
+	153	text = re.sub(r"^```(?P<language>.+?)\n(?P<code>.+?)\n```$", code, text, flags=re.S)
+	154	# Doing inline code
+	155	text = re.sub(r"``(?P<code>.+?)``", inline_code, text, flags=re.S)
+	156	text = re.sub(r"`(?P<code>.+?)`", inline_code, text, flags=re.S)
+	157	# Header lines
+	158	text = re.sub(r"^((\\|_\|-) ){3,}$", horizontal_rule, text)
+	159	# Line breaks
+	160	text = re.sub(r" $", line_break, text)
+	161
155	162	return text
156	163	"""
157	164
158	165
159	166
160	167
161	168
162	169	block_elements_table = {
163	170	"code": r"```(?P<language>\w+)\n( .*\n)+",
164	171	"blockquote": r"^> (?P<text>.+)
165	172	"paragraph": r"(?P<text>(^.+\n)+)",
166	173	"header": r"^#{1,6} (?P<title>(\w+ ?)+ ) ?#$",
167	174
168	175
169	176	element_table = {
170	177	"emphasis": (r"\(?P<text>[^\.])\\|_(?P<text>[^\_.]*)_", emphasis, emphasis_end),
171	178	"strong": (r"\\(?P<text>[^.])\\\|__(?P<text>[^\_.]*)__", strong, strong_end),
172	179	"unordered list": (r"")
173	180	"inline link": (r"\[(\w\s)+\]\(
174	181
175	182
176	183	def translate(text, begin, end, parameters):
177	184
178	185	if alpha: # If this contains no more nested elements:
179	186	return begin.format(parameters) + text + end
180	187	elif beta: # text contains nested elements:
181	188	# Find parameters or something IDK
182	189	return begin.format(parameters) + _
183	190	translate(text[alpha:beta], begin_tag, end_tag, found_parameters) + _
184	191	end
185	192
186	193	# Zoom zoom insert magic code here
187	194
188	195	# NOTE: Hyperlinks are handled specially in Markdown. Check the syntax page
189	196	# for more information. That said, it's imperative to first collect all
190	197	# information about hyperlinks, and remove it, so it can be used when
191	198	# parsing hyperlinks.
192	199
193	200	# Table of all elements and their respective regular expression:
194	201	elements = {
195	202	paragraph: r"",
196	203	ordered_list_item: r""
197	204	hyperlink:
198	205	header1: r"^# [*\(\n)] \n"
199	206	}
200	207
201	208	"""
202	209	""" The reason the length is stored instead of the end, is because it is
203	210	less error prone; if a parent node is updated, only the begin needs to be
204	211	updated, as the length is still the same for the node. The begin can be
205	212	relative to the parent node, so even that won't have to be updated. """
206	213	"""
207	214	node = {
208	215	"type": block_type,
209	216	"begin": begin,
210	217	"length": length,
211	218	"children": children,
212	219	}
213	220
214	221	return markdown_code
215	222	"""
216	223

1	1	from django.db import models
2	2	import datetime
3	3	import os
4	4
5	5	def post_title_directory(instance, filename):
6	6	""" Files will be uploaded to MEDIA_ROOT/blog/<year of publishing>/<blog
7	7	title>
8	8	The blog title is determined by the text before the first period (".") in
9	9	the filename. So if the file has the name "Trains are bæ.en.md", the file
10	10	will be stored in "blog/<this year>/Trains are bæ". Name your files
11	11	properly!
12	12	It should also be noted that all files are stored in the same folder if they
13	13	belong to the same blogpost, regardless of language. The titles that are
14	14	displayed to the user however, should be the titles of the files themselves,
15	15	which should be in the native language. So if a blog post is titled
16	16	"Universities of Belgium", its Dutch counterpart should be titled
17	17	"Universiteiten van België", so the correct title can be derived from the
18	18	filename.
19	19
20	20	Recommended way to name the uploaded file: "<name of blog post in language
21	21	it's written>.md". This removes the maximum amount of redundancy (e.g. the
22	22	language of the file can be derived from the title, no ".fr.md" or something
23	23	like that necessary), and can directly be used for the end user (the title
24	24	is what should be displayed).
25	25	"""
26	26	english_file_name = os.path.basename(instance.english_file.name) # TODO: Test if this returns the file name!
27	27	english_title = english_file_name.rpartition(".")[0]
28	28	year = datetime.date.today().year
29	29
30	30	return "blog/{0}/{1}/{2}".format(year, english_title, filename)
31	31
32	32	class Post(models.Model):
33	33	""" Represents a blog post. The title of the blog post is determnined by the name
34	34	of the files.
35	35	A blog post can be in 5 different languages: German, Spanish, English, French,
36	36	and Dutch. For all these languages, a seperate field exists. Thus, a
37	37	translated blog post has a seperate file for each translation, and is
38	38	seperated from Django's internationalization/localization system.
39	39	Only the English field is mandatory. The others may contain a value if a
40	40	translated version exists, which will be displayed accordingly.
41	41	"""
42	42	published = models.DateTimeField(auto_now_add=True)
43	43	english_file = models.FileField(upload_to=post_title_directory, unique=True, blank=False)
44	44	dutch_file = models.FileField(upload_to=post_title_directory, blank=True)
45	45	french_file = models.FileField(upload_to=post_title_directory, blank=True)
46	46	german_file = models.FileField(upload_to=post_title_directory, blank=True)
47	47	spanish_file = models.FileField(upload_to=post_title_directory, blank=True)
48	48	# Only the English file can be unique, because apparantly, there can't be
49	49	# two blank fields in a unique column. Okay then.
50	50
51	51	def __str__(self):
52	52	return os.path.basename(self.english_file.name).rpartition(".")[0]
53	53
+	54	class Comment(models.model):
+	55	""" Represents a comment on a blog post.
+	56
+	57	Comments are not linked to an account or anything, I'm trusting the
+	58	commenter that he is honest with his credentials. That being said:
+	59	XXX: Remember to put up a notification that comments are not checked for
+	60	identity, and, unless verified by a trustworthy source, cannot be seen as
+	61	being an actual statement from the commenter.
+	62	Comments are linked to a blogpost, and are not filtered by language. (So a
+	63	comment made by someone reading the article in Dutch, that's written in
+	64	Dutch, will show up (unedited) for somebody whom's reading the Spanish
+	65	version.
+	66	XXX: Remember to notify (tiny footnote or something) that comments showing
+	67	up in a foreign language is by design, and not a bug.
+	68	"""
+	69	date = models.DateTimeField(auto_now_add=True)
+	70	name = models.TextField()
+	71	mail = models.EmailField()
+	72	post = models.ForeignKey(Post) # TODO: Finish this class and the shit...
+	73
+	74