Analysis Results of The Hong Kong News
Result of Post OCR
Since the OCR result provided by the library was not good enough, we tried to use three different OCR engines to process the raw image and use a method to combine the results.
The original OCR result | After post-OCR | |
At the out.set of his speechfessor Enomoto declared thht Ptroi:lt.obe greatly regretted th,~tldj~~~21and the United States shoukgone to war with one anotherdespite the fact that they had beendestined to co-operate for thservation of peace He said that it is but nthat these warring nations :resort to t.he use of the late:most e靠ective weapons fo:overthrow af each other, butmust be limits to be strictly ad-lhered to,:.and distinction shouLd belstrictly made between combatants)and non-combatants. He stated that from past ex-periences, he had expected that th e’Amerioan forces would strictlyabide by the rules of humane xvar-fare in pursuing this war, but theprogress of the war has revealedthe contrary. | At the outset of his speech, Pro-fessor Enomoto declared that it is to be and the United States should have gone to war with one another despite the fact that they had been destined to co-operate for the pre-servation of peace in the Pacife. He said that it is but natural that these warring nations should resort to the use of the latest and most effective weapons for the overthrow of each other, but there must be limits to be strictly ad-hered to, and distinction should be strictly made between combatants and non-combatants. Hestated that from past ex-eriences, he had expected that the pursuing this war, but the progress of the war has revealed the contrary. |
Result After NLP
After obtaining a more precise result from POST-OCR, we operate NLP to extract keywords from The Hongkong News per day. Part of the result is shown as follows.
Part of the Keyword Results
In order to present a more tuitive result from the keywords, we generated a bar chart for different designed period to describe the frequency of keywords. The result of largest time intervel is shown below.
Keyword Frequency (Top 50) for
the Desgined Period (1942-1945)
Visualization Result
After the above process, we developed our visualization tool to better present the raw data. The tool presents the keyword frequency bar chart against the timeline determined by users. Furthermore, when a user clicks a word in the bar chart, it shows the links to the related pages in The Hongkong News from the CUHK Digital Repository.