Updates to character default readings

Each character 單字 in the Cantonese Font receives a default reading. This is the sound that is represented when the character stands alone.

[TODO] explain the causes of multiple readings in Cantonese — split up into separate article.

Each choice here is tugged in competing directions (the article explaining some factors are getting very long, and split up into a different place). These includes:

  • 破讀
  • 文白異讀
  • 異讀
  • 訓讀
  • 繁簡
  • 「正讀」
  • 「誤讀」
  • 懶音
  • 唔知點讀 / 有音無字

Pokfield+ pragmatically and philosophically aspires to meeting needs for language learner and teachers of all levels. This means that when I select a “default reading”, I choose it based on who is likely to use it, and how they are likely to use it for.

In general, beginner learners (and their teachers) want to have pronunciations like how people use them everyday. This means that, for frequently use words, I favor the colloquial / customary usage. 玩 is assigned waan2, even though its literal / dictionary reading is wun6 (this does not affect its usage as wun6 in words such as 玩具 or 玩家).

(There are exceptions to this “frequent words = descriptive” approach. An example is 咁, which is assigned as gam3. The proper usage, up to the 1980s, is that 噉 gam2 and 咁 are two different characters with separate glyph and sound and meaning; there are practical advantages to disambiguating.)

As we progress towards less frequently use characters, I gradually favor dictionary readings. My anticipation is that people encountering characters like 蚤 wants to know how it is read, and not “what people usually mistaken it as”. As a general rule of thumb, characters outside the top 3,000 usage can be expected to be assigned a dictionary sound.

A set of examples show this well: 噢 and 燠 both have a common reading as exclamation o1 / ou2, and a classical reading as jyu4/ juk1. The latter is quite infrequently used. In this case, 噢 is assigned as o1 whereas 燠 as juk1.


A further complication is with 繁簡 trad-simplified. The general rule of thumb is that 繁 is prioritized; example is that 吓, which is “to frighten / 嚇 / haak3” in simplified, and “final particle for surprise / haa2” in trad-HK usage, I assign a default of haa2.

This is, again, by no means iron-clad. A character like 体 has a historical Traditional root, but its meaning (of “stupid”) and reading (ban6) is essentially forgotten. I have thus chosen tai3 since this usage in Simplified is more closely aligned with expectations / usage.

1 Like

Changes from 2.0.21.1 → 2.0.22.0

These changes are culminated reviewing the top 8,000 characters from 2024-02-01 through 2024-02-12. The characters, their Unicode (it’s often not so clear what is being read on screen), the proposed change, and comment, are listed. You are welcome to disagree and make additional comments (perhaps with implications on word additions required).

These changes is expected to be reversible; it will remain fluxional until 2024-04-15 when a final version is locked down, after which the default readings will not change until a major version number (v3 Sycamore family, earliest 2025 Summer).

  1. 咁, 5481, gam2gam3. Both senses are frequently needed, and suggest educating users on favoring 噉 for the sense of “well, then…”

  2. 會, 6703, wui6wui5. Standalone usage is more likely in the context of “to know” (我會跳舞) / “possibly” (她會來嗎). The sense of meeting is usually in words 會見, 研討會, 歲會, 會面, 會師, 會議, 會客, 宴會, 會合. Need to check / add also 領會, 會意, 機會, 適逢其會, 會計, 會鈔, 會賬. contexted next to 會同

  3. 為, 70BA; 爲, 7232, wai4wai6. The sense of “for…” / “because” is of more frequent usage. Need to patch words: 作為, (事在)人為, 為頭,為政,為伍,為難,為鬼,為域,為虎作倀, 何爲.

  4. 下, 4E0B, haa5haa6 contexted?. Needs to patch: [1-10] 一下… + 兩下 + 百下 + 千下 + 幾(多)

  5. 生, 751F, sang1saang1. Prefers colloquial > literal for frequent char. Need to patch: 生活, 生命, 生產, 生死, 生利, 出生, 生成.

  6. 成, 6210, seng4sing4. Needs to patch: 成日, 成個.

  7. 呀, 5440, aa1aa3. UNSURE.

  8. 行, 884C, hang4haang4. Prefer colloquial. Patch hang4: 行走, 步行, 發行, 行為, 行動, 行將, 行年, 進行, 行李, 行裝, 行文, 行刑, 步行, 旅行, 發行, 實行, 五行, 琵琶行, 兵車行, 行車, 行色匆匆, patch hong4 [1-10, 兩 + 百 + 千 + 幾(多)]

  9. 黎, 9ECE, lai2lai4. Mistake. contexted next to 老

  10. 正, 6B63, No change (keep zing3). Patch for colloquial zeng3 uses: 正呀, 正喎

  11. 相, 76F8, No change (keep soeng1). Patch for soeng3: 相貌, 相馬, 相機而動, 丞相, 照相, 相公, 相夫教子; soeng2 照相, [1-10, 兩 + 百 + 千 + 幾(多)]張相.

  12. 咪, 54AA, mai1mai5. Infrequently used as “mic” / “mile”; frequent use as “don’t”. Patch mai6: 咪就係 , 咪又 , 係唔係, 係咪. No easy distinct patch for mai1.

  13. 重, 91CD, cung5zung6. Patch cung5: 舉重, 重金, 輕重, 重量, [1-10, 兩 + 百 + 千]斤重, [1-10, 兩 + 百 + 千]両重, [1-10, 兩 + 百 + 千]磅重, [1-10, 兩 + 百 + 千 + 幾(多)]克重.

  14. 種, 7A2E, no patch (zung2). Too many as measure word 兩種人. Patch zung3 種植, 種痘, 種花, 種菜, 種樹, 種米, 種稻

  15. 更, 66F4, gang1gang3. Too many isolated use for gang3 “to a greater extent”. Patch gang1: 更換, 更改,變更,更新,更正, 少不更事; patch `gaang1: [1-10]更, 看更, 更鼓, 打更, 夜更, 日更

  16. 長, 9577, no change (keep coeng4). Patch zoeng2: 生長, 消長, 長子, 長女, 長孫, 長者, 家長, 長輩, 處長, 部長, 尊長, 長大, 長成, 長進, 師長, 長幼

  17. 平, 5E73, ping4peng4. Compounds with ping4, tends to isolation with peng4. Patch ping4: 平滑, 平整, 平坦, 平分, 平等, 平均, 平安, 平靜, 平穩, 心平(氣和), 平定, 平叛, 平息, 平時, 平常, 平日, 平聲, 平衡, 公平, 昇平, 平正

  18. 名, 540D, ming4meng2. Favors colloquial, more conjunctions available for ming4. 姓名, 名譽, 著名, 名勝, 名目, 名山, 名士, 名號, 功名, 名稱, 名分, 名義, 名物, 名堂

  19. 玩, 73A9, wun6waan2. Favors colloquial, more conjunctions available for wun6. Patch wun6: 玩味, 古玩, 玩世(不恭), 玩弄, 玩索, 玩耍, 玩具, 玩賞, 玩月, 玩物, 玩偶, 玩戲

  20. 著, 8457. No change (keep zyu3). Should favor 異體字 着 for wear zoek3. Patch zoek3 for: 穿著, 衣著, 著衫.

  21. 轉, 8F49. No change (keep zyun2). May need to revisit.

  22. 命, 547D. ming6meng6. Favors colloquial, clean compound conjunctions for ming6. Patch ming6: 生命, 命令, 革命, 命運, 命途, 性命, 天命

  23. 請, 8ACB. cing2ceng2. Favors colloquial. Patch cing2: 請教, 請安, 造請, 請求, 請命.

  24. 樓, 6A13, no change (keeps lau4). Patch lau2: [1-10, 幾(多)]樓. (Example of 變調 that doesn’t show up in dictionaries.)

  25. 量, 91CF, no change (keep loeng4). Patch loeng6: 膽量, 酒量, 飯量, 音量, 大量, 少量, 多量, 衡量, 量力(而為), 量才, 度量, 容量, 質量

  26. 房, 623F, no change (keep fong4). Patch for fong2: [1-10, 兩, 幾(多)]房, 廚房, 客房, 主人房, 睡房

  27. 頂, 9802. No change (keep ding2). Patch deng2: 頭頂, 屋頂, 山頂, 禿頂, [1-10, 兩, 幾(多)]頂(帽)

  28. 牌, 724C. No change (keep paai4). Patch for paai2: 發牌, 抽牌, 出牌, 洗牌, 打牌, 排牌, 派牌, 啤牌, 疊牌, [ x ]隻牌, [ x ]副牌.

  29. 彈, 5F48. No change (keep daan6). Patch for taan4: 彈劾, 彈唱, 彈琴, 彈棉花, verb more likely than daan6: 不彈 / 唔彈 / 彈唔彈 / 識彈 / 彈咗 / 彈完 / 彈埋 / 彈嚟 / 彈過. Patch for daan2: 炸彈, 子彈, 手榴彈, 導彈, 飛彈, 發彈, 閃光彈, 催淚彈

  30. 朝, 671D. No change (keep ciu4). Patch for ziu1: 朝早, 朝頭早, 今朝, 朝夕, 朝鮮, 朝會, 朝陽, 朝氣, 朝不保夕

  31. 魚, 9B5A. No change (keep jyu4), Patch for jyu2 (變調): 釣魚, 養魚, 食用魚, 海魚, 鯡魚, 鱈魚, 鯷魚, 金槍魚, 比目魚, 鮭魚, 魷魚, 娃娃魚, 三文魚, 鯰魚, 鯉魚, 左口魚, 炸魚, 水煮魚, 煎魚, 蒸魚, 燻魚, 烤魚, 淡水魚, 黃花魚, 紅衫魚, 䱽魚, 水魚

  32. 蛋, 86CB, daan6daan2. Favors colloquial.

  33. 盤, 76E4, no change (keep pun4). patch for pun2: 筍盤.

  34. 橋, 6A4B, no change (keep kiu4) Patch for kiu2: 好橋, 屎橋, () 吓橋, (有)冇橋, 橋唔怕舊.

  35. 豆, 8C46, dau6dau2. Favor colloquial. Patch for dau6: 豆苗, 豆丁, 豆釘, 豆卜, 豆惡, 豆腐, 豆皮, 豆瓣醬

  36. 檔, 6A94, dong2dong3. More unpredictable patching as measure word. Patch for dong2: 檔案, 歸檔, 查檔, 存檔, 文件檔

  37. 劈, 5288, pik1pek3. Favors colloquial. No known patches?

  38. 糊, 7CCA, no change (keep wu4) Patch for wu2: 芝麻糊 核桃糊 腰果糊 杏仁糊 糊仔

  39. 扑, 6251, bik1 (mistake in Table1?) → bok1. Patch for literal pok3: 扑滅, 扑打, 鞭扑,

  40. 戇, 6207, zong3 (罕讀) → ngong3.

  41. 泡, 6CE1. No change (keep paau1). Not clear what is the optimal play here.

  42. 擴, 64F4. kwok3kwong3. Favors common/colloquial.

  43. 廟, 5EDF. miu6miu2. Favors colloquial; many combinations of [zz]廟. Patch miu6: 廟宇, 宗廟, 廟堂, 廟祝

  44. 繩, 7E69. sing4sing2. Favors common/colloquial. Patch sing4: 繩索, 繩守, 繩縛, 繩之於法

  45. 檸, ning4ning2. Many combinations of 青 ning2茶 水…, only one for 檸檬.

  46. 眨, zaap3zaam2 . Favors colloquial. No patch for zaap3 needed.

  47. 匙, ci4si4. Few usage of 匙羹 / 湯匙 / 茶匙, whereas si4 is more flexibly used.

  48. 撈, 6488, laau4lou1. Favors colloquial. Patch laau4: 海底撈月, 打撈, 大海撈針, 撈魚

  49. 寺, 5BFA, zi6zi2. Favors colloquial, many [zz]寺 combinations. Patch zi6: 寺觀, 寺院, 古寺, 寺僧, 寺廟, 寺人

  50. 礦, kong3kwong3. Favors colloquial. No patches needed.

  51. 姨, 59E8, ji4ji1. Clear contexts for ji4: 姨媽, 姨娘, 姨太, 姨婆

  52. 嶺, 5DBA, ling5leng5. Favors colloquial, more [zz]嶺. Patch: ling5, 嶺南

  53. 簿, 7C3F, bou6bou2. Favors colloquial. No patch?

  54. 漲, 6F32, zoeng2zoeng3. More standard zoeng3. No patch?

  55. 墊, 588A, dim3din3.

  56. 什, 4EC0. No patch (keeps sap6); the other sounds have clear context compounds; sap6 is more flexible and harder to hit. Patch for zaap6: 什物, 什錦, 家什. Patch for sam6: 什麼

  57. 隙, 9699, gwik1kwik1. Contemporary reading. No patch.

  58. 鋸, 92F8, geoi3goe3. Favors colloquial. No patch.

  59. 盎, 76CE, ong3joeng1. Favors colloquial. Not clear when ong3 is preferred?

  60. 扳, 6273, no change (paan1). Patch baan2: 扳手

  61. 摒, 6452, bing3bing2. Favors colloquial. No patch?

  62. 蔣, 8523, zoeng1zoeng2. Correction?

  63. 姥, 59E5, mou5lou5. Patch mou5: 天姥(山/峯)

  64. 氓, 6C13, maang4man4. Favors colloquial. No patch.

  65. 应, 5E94, jing3jing1. Align with traditional 應.

  66. 俺, 4FFA, aan2jim3. Table1 error? No patch.

  67. 椎, 690E, zeoi4zeoi1. Higher frequency for use as spine zeoi1. Patch for ceoi4: 鐵椎, 椎打, 木椎, 椎魯

  68. 焯, 712F, zoek3coek3. Favors common use. Patch for zoek3: 燒灼, 焯著, 焯見

  69. 踁, 8E01, hing5ging3. Error in Table1?

  70. 广, 5E7F, jim5gwong2. One of the cases where Simplified is preferred (need escaping at code stage also with the unicode!)

  71. 听, jan5ting1. One of the cases where Simplified is preferred (need escaping at code stage also with the unicode!)

Updates 2.3.0 → 2.4.1

  1. 橛, 6A5B, kyut3gyut6. Favors common use (common spoken but not written); measure words. Patch for kyut3: 木橛, 銜橛
  2. 斷, 65B7, dyun6tyun5. Favors colloquial. Patch for dyun6: 折斷, 斷水, 斷糧, 斷煙, 斷酒, 中斷, 一刀兩斷, 割斷, 當機立斷, 優柔寡斷, 診斷
  3. 斜, 659C, ce4ce3. Favors colloquial. Patch for ce4: 傾斜, 斜坡, 斜陽, 歪斜, 斜風細雨
  4. 和 added wo2for 怡和
  5. 㗱, 35F1, added zep4 for 㗱㗱聲