Disclaimer
Purpose: This static app is an educational prototype for language practice and developer sharing. It helps learners prepare polite Tagalog phrases for AWS Manila Community Day.
Non-Commercial: This project has no paid feature, no advertising, no registration requirement, and no commercial purpose. It is intended for learning, experimentation, and community preparation.
No Guarantees: Generated language content may contain mistakes. Tagalog translations, grammar explanations, pronunciation guides, and cultural notes should be checked by native speakers before production use.
Scope: This app is not an official AWS product, not an official translation tool, and not a substitute for human language instruction. It is a technical demo with useful learning content.
Community Respect: The app should avoid stereotypes and should teach polite language carefully. Words like po, opo, kayo, and ninyo should be explained as tools for respect, not decoration.
Demo
Paki-check po kung pumasok ang bayad.
Grammatical Breakdown
- Paki-check: Filipino-English request phrase meaning "please check."
- po: Respect marker used for polite speech.
- kung: Means "if" or "whether."
- pumasok: Means "entered" or "came in" in this payment context.
- bayad: Means "payment."
Pronunciation Guide
It is pronounced word by word as:
pah-kee-chehk poh koong poo-mah-sohk ahng bah-yahd.
- Paki-check: pah-kee-chehk.
- po: poh.
- kung: koong.
- pumasok: poo-mah-sohk.
- bayad: bah-yahd.
Content Snapshot
Core enrichment tasks:
- Find every div.extra-example block
- Read the Tagalog sentence from span
- Generate grammar breakdown list items
- Generate pronunciation guide text and chunk list
- Replace old sections in the HTML
- Write updated article files
- Print sanity checks
Table of Content
Part 1: Use Batch Scripts For Article Groups
This section explains why the enrichment work is split across multiple scripts instead of one giant file.
Part 2: Treat Glossaries As Small Local Knowledge Bases
This section shows how the scripts use dictionary entries for beginner-friendly grammar meanings and local loanword explanations.
Part 3: Generate Pronunciation From Known Words And Fallback Rules
This section explains the pronunciation map, token handling, vowel fallback, and chunked beginner output.
Part 4: Patch HTML With BeautifulSoup
This section shows the code pattern that finds existing sections and replaces only the content after a target heading.
Part 5: Validate Enriched Output
This section explains why the scripts print counts for extra examples, pronunciation phrases, local tips, and grammar breakdowns.
Part 1: Use Batch Scripts For Article Groups
Goal
Improve grammar and pronunciation across 24 article pages without hand-editing every card.
Development skill
The key skill is controlled batch processing. Each script owns a small group of article files and a glossary tuned to that group.
files = [
"article-22-manila-daily-home-laundry-bills-and-errands.html",
"article-23-manila-daily-work-study-and-social-plans.html",
"article-24-manila-daily-health-safety-weather-and-money.html",
]
This pattern is easier to review than one huge script because each batch can carry topic-specific words. Community Day pages need words such as registration, workshop, badge, and volunteer. Manila Daily pages need words such as laundry, delivery, battery, cash, clinic, and medicine.
Prompt
For each article group, update every extra example.
Keep the existing Tagalog sentence.
Replace the grammar breakdown with beginner-friendly word meanings.
Replace the pronunciation guide with word-by-word pronunciation.
Write an updated HTML file and print a summary.
Result
The project gets a repeatable enrichment workflow:
article group
-> topic glossary
-> visible Tagalog sentence
-> grammar list
-> pronunciation guide
-> updated HTML
-> sanity check
Tips
- Split scripts by article topic when the vocabulary changes.
- Keep the file list explicit so reviewers know the script scope.
- Use the same function names across batches.
- Prefer deterministic output over runtime AI generation.
- Print a summary after every batch.
Part 2: Treat Glossaries As Small Local Knowledge Bases
Goal
Turn each script into a small, inspectable language helper.
Development skill
The scripts use dictionaries as local knowledge bases. A word such as po or saan receives a stable beginner explanation, while unknown words fall back to a generic local-use explanation.
defs = {
"po": "Respect marker used for polite speech.",
"saan": "Means where.",
"workshop": "English loanword used locally; means workshop.",
"badge": "English loanword used locally; means badge.",
}
def get_def(word):
key = token_key(word)
if key in defs:
return defs[key]
return f'English loanword or useful word used locally; means "{word}" in this context.'
Why this matters
This approach is not a full grammar parser, but it is useful for a static learning prototype. The learner sees consistent meanings, and the developer can update one dictionary entry when a meaning needs improvement.
Example
Sentence:
Saan po ang registration area?
Generated breakdown:
- Saan: Means where.
- po: Respect marker used for polite speech.
- ang: Focus marker placed before the main noun or idea.
- registration: English loanword used locally; means registration.
- area: English loanword or useful word used locally; means "area" in this context.
Tips
- Keep definitions short.
- Explain loanwords honestly instead of pretending every word is pure Tagalog.
- Use beginner wording.
- Add topic-specific glossary entries when fallback text appears too often.
- Treat dictionary entries as reviewable content.
Part 3: Generate Pronunciation From Known Words And Fallback Rules
Goal
Give every extra example a pronounceable guide even when not every word is in the pronunciation map.
Development skill
The scripts combine two strategies:
Known word:
use a curated pronunciation and optional syllable chunks.
Unknown word:
use a simple vowel fallback so the learner still gets a readable guide.
Code pattern
pron = {
"salamat": ("sah-lah-maht", [("sa", "sah"), ("la", "lah"), ("mat", "maht")]),
"kayo": ("kah-yoh", [("ka", "kah"), ("yo", "yoh")]),
"bayad": ("bah-yahd", [("ba", "bah"), ("yad", "yahd")]),
}
vmap = {"a": "ah", "e": "eh", "i": "ee", "o": "oh", "u": "oo"}
def fallback_pron(word):
output = []
for character in word.lower():
if character in vmap:
output.append(vmap[character])
elif character.isalpha():
output.append(character)
return "".join(output) or word
Example output
Tagalog:
Uminom po kayo ng tubig dahil mainit.
Pronunciation:
oo-mee-nohm poh kah-yoh ngah too-beeg dah-heel mah-ee-neet.
Chunks:
- Uminom: oo-mee-nohm.
- po: poh.
- kayo: kah-yoh.
- tubig: too-beeg.
- dahil: dah-heel.
- mainit: mah-ee-neet.
Tips
- Curate common words first.
- Keep fallback simple and transparent.
- Preserve acronyms and technical words carefully.
- Make pronunciation useful enough for practice, not over-precise.
- Ask native speakers to review important phrases.
Part 4: Patch HTML With BeautifulSoup
Goal
Update the generated HTML without rewriting the whole article page.
Development skill
The enrichment scripts parse the page with BeautifulSoup, find each div.extra-example, read the Tagalog span, and replace the content after specific headings.
for fname in files:
soup = BeautifulSoup(Path(fname).read_text(encoding="utf-8"), "html.parser")
divs = soup.find_all("div", class_="extra-example")
for div in divs:
span = div.find("span", lang="tl")
if not span:
continue
sentence = " ".join(span.get_text(" ", strip=True).split())
replace_after_heading(div, "Grammatical Breakdown:", [make_breakdown_ul(soup, sentence)])
replace_after_heading(div, "Pronunciation Guide:", make_pronunciation(soup, sentence))
The helper replace_after_heading is important because it avoids replacing the entire card. It removes only the old content between one heading and the next known heading.
def replace_after_heading(div, heading_text, new_nodes):
heading = None
for candidate in div.find_all("p", recursive=False):
strong = candidate.find("strong")
if strong and heading_text in strong.get_text():
heading = candidate
break
if not heading:
return False
sibling = heading.find_next_sibling()
while sibling:
next_sibling = sibling.find_next_sibling()
if sibling.name == "p":
strong = sibling.find("strong")
if strong and "Pronunciation Guide:" in strong.get_text():
break
sibling.extract()
sibling = next_sibling
last = heading
for node in new_nodes:
last.insert_after(node)
last = node
return True
Tips
- Parse HTML instead of doing blind string replacement when the structure matters.
- Keep the search scope narrow.
- Preserve the learner sentence.
- Replace only the generated helper sections.
- Write output to a separate file when testing a risky batch.
Part 5: Validate Enriched Output
Goal
Prove that the batch update touched the expected content.
Development skill
The scripts print summary rows and sanity checks after writing files.
print("Update summary:")
for source, out, total, updated, missing in summary:
print(f"{source} -> {out}: extra_examples={total}, updated={updated}, missing={missing}")
print("Sanity check:")
for out in outputs:
soup = BeautifulSoup(Path(out).read_text(encoding="utf-8"), "html.parser")
divs = soup.find_all("div", class_="extra-example")
phrase = sum(1 for div in divs if "It is pronounced word by word as:" in div.get_text())
breakdown = sum(1 for div in divs if "Grammatical Breakdown:" in div.get_text())
print(f"{out}: extra_examples={len(divs)}, pron_phrase={phrase}, has_breakdown={breakdown}")
Result
The developer can explain the enrichment pipeline as a measurable process, not a manual cleanup.
Input:
article HTML files
Transformation:
grammar and pronunciation regeneration
Output:
updated HTML files
Evidence:
counts for extra examples, pronunciation phrases, and grammar breakdowns
Tips
- Count the exact blocks you changed.
- Print missing sections instead of silently skipping them.
- Keep validation close to the script.
- Use generated counts as technical-sharing evidence.
- Review a few updated cards visually after the script passes.
Field Note 1: Enrichment Is A Product Layer
Background: A phrase pair is useful, but grammar and pronunciation turn it into a learning card.
Goal: Add repeatable learning support without changing the sentence-card layout.
Prompt: Generate grammar and pronunciation from the visible Tagalog sentence.
Result: Every extra example becomes more useful for beginners.
Review check: Does the generated helper content explain the actual sentence on the card?
Field Note 2: Batch Scripts Are Reviewable
Background: One script for all 24 articles would be large and hard to tune.
Goal: Keep each batch close to its vocabulary domain.
Prompt: Process only three article files per script.
Result: Community Day, Friendship, and Manila Daily content can each have better local glossaries.
Review check: Can a reviewer understand the vocabulary scope from the file list and comments?
Field Note 3: Fallbacks Need Humility
Background: The pronunciation fallback is useful, but it is not a native-speaker guarantee.
Goal: Give learners a starting point while keeping the review requirement visible.
Prompt: Use curated pronunciation when available and simple fallback when needed.
Result: The site stays useful even before every word has a perfect pronunciation entry.
Review check: Are important event phrases curated instead of relying only on fallback?
Technical Sharing Angle
For a developer talk, this enrichment pipeline is a strong example of practical content engineering:
HTML article files
->
BeautifulSoup parser
->
extra-example blocks
->
Tagalog sentence extraction
->
glossary definitions
->
pronunciation map and fallback
->
section replacement
->
updated HTML files
->
sanity checks
The lesson is simple: AI-assisted learning content still needs deterministic tools. Small scripts can turn generated pages into reviewable educational material.
Closing Reflection
The grammar and pronunciation scripts show a useful middle ground between hand editing and overbuilding. The project does not need a database or a language engine to improve every card. It needs clear article batches, topic glossaries, pronunciation helpers, careful HTML patching, and validation output. That makes the app better for learners and easier to explain to developers.