Visual Novel OCR, the software and the movement, represents a new approach to grab text from visual novels or dialogue-heavy games in general. As long as the game is on your screen, VNO will work. An increasingly common trend nowadays is game streaming/cloud gaming. Game content is processed on a different machine (PS4, Xbox, server, etc) and only graphical videos are sent back to your local machine (text hooker won’t work on MP4, PNG, or JPEG format). In this case, the only solution is OCR technology
Higurashi no Naku Koro ni Hou streamed from PS4 to PC (played with Visual Novel OCR)
OCR stands for “optical character recognition”, or image to text to put it simply. Visual Novel OCR leverages Tesseract 5, the best open-source OCR engine available along with pre-trained models for Japanese horizontal and vertical text recognition.
Use both because why not. Users who downloaded both Textractor and Visual Novel OCR are reported to be happier because they don’t have to decide what to use, and now they can play all the visual novels they want.
The demo video to showcase the capability of Visual Novel OCR on various games and emulators can be found here. Demo Video (this is where I upload updates' links)
Download links are included in the description of the above video and updates will always be posted there.
The program doesn’t require any installation except that you will need to permit “NodeJS” on first usage to operate on your machine. This is the back-end of VNO that handles online translation and connects various moving components into one cohesive package.
When you opened the program, you should be able to see the main menu window and Translation Aggregator (a very flexible software)
As mentioned in the previous section, this is one of the two key features in Visual Novel OCR. It is sort of a permanent mirror/window laying on top of the dialogue section in games. This allows users to conveniently get the coordinates of the text area and extract the content inside. First, drag and resize the text capture window to fit the dialogue section.
Then, click “transparent” to make it into a see-through window
As you can see from the picture, there are three buttons corresponding to three things you can do.
Go back to the semi-transparent window so you can move around
Let you crop the text inside the screen. You can use the backtick(`) or TAB button instead of clicking too. (on first attempt, you need to manually set up color contrast threshold, will get to that later)
Let you crop/snip any area on the screen like menu items or choices
Translate text after optimal color contrast threshold.
Crop in-game choices
The most important feature of Visual Novel OCR. It is a window that lets you adjust color contrast properties so the final output is an image with black/colored text on white background
A sample setting
The reason why human eyes can discern text from background is because text has distinct shapes and is also “brighter”. VNO focused on the distinct color aspects that summed up as HSV or (color, saturation, brightness in the game window). Usually with saturation, brightness, or a combination of the two, users can capture all the text in game. This setting requires a bit of practice, but not difficult. One thing to note is also thin lines will yield better result than thick lines
Thick line (can lead to inaccurate kanjis)
Thin line (much better now)
This is a great program that can do many things. It can hook text, get translation from many services online, and act as a very useful dictionary. The bundled translation aggregator serves mainly the 3rd functionality, but technically, you can freely customize it.
Previously, there were two major methods used to understand Japanese games. The first is to wait for fans or official translation (requires substantial amount of patience), and the second is to use text hooker softwares like VNR or Textractor. The latter approach worked by injecting a monitor script to the running game to find and “hook” text data, mainly dialogue, for dictionary lookup or direct machine translation. This method works very well with common game engines like Kirikiri (Fate Stay Night) or Renpy (Doki Doki Literature). However, it becomes very complicated or impossible when handling newer engines, in-house engines, or emulators.
Unsuccessful attempt to hook text on Suidoken 1 (RetroArch emulator)
VNO is not the first tool that offers OCR functionality that can work with visual novels, some general OCR tools like Capture2Text let users snip the screen for text extraction, even for direct translation like text hooker software. However, the process to capture text is tedious, requiring users to constantly drag a long rectangle over new dialogue. Not to mention, the accuracy can range from quite bad to really bad when encountering a somewhat transparent background, which is common in visual novels.
Capture2Text doesn’t work with semi-transparent background
Visual Novel OCR has been successful in solving these two issues to make OCR finally become a user-friendly alternative to text hooking. The two main mechanisms involved are mirror screen capture and color contrast threshold.
-Can I translate from Japanese to another language?
Yes, you can. Go to the main menu and change from English to your preferred language.
-I only want Japanese text and I don’t care about machine translation?
I got you, buddy. There are two ways to do it. The first method is to disable Wifi so you would receive blank translation and extracted text. The second method requires you to change the translation language to Japanese.
-The color contrast setting is complicated, how can I get help?
No worry, you are not alone. Hop on the discord group for help, there are members here who might have the games you played or have experience for the text you have. In addition, I include a folder full of reference images inside the program, look it up.
-I saw your demo for Suidoken I and Sakura Taisen, does that mean VNO also worked with RPG?
Yes and no. With the mirror screen capture and crop function, it is possible to handle all text in a game. However, if your game is dynamic or has too many text items, it might be a hassle or not practical.