Fate of Atlantis ScummTR translation details

hetmes · Post by **hetmes** » Mon Dec 16, 2024 12:14 pm

I'm translating Indiana Jones and the Fate of Atlantis into Dutch using ScummTR to do the heavy lifting. I am following the extract/translate/ingest procedure, and I can play the translated game. However, I do have some questions.

Sample input for translations

Code: Select all

a1-cageroom-robot-smash
\255\010\172\186\255\010\112\000\255\010\010\000\255\010\000\000It's loose on the\016floor.
hinge pin
\255\010\245\162\255\010\114\000\255\010\010\000\255\010\000\000I can't reach it from\016here.
\255\010\027\216\255\010\114\000\255\010\010\000\255\010\000\000It's the tunnel leading back where I came\016from.
tunnel
\255\010\215\021\255\010\114\000\255\010\010\000\255\010\000\000I can't reach it from\016here.
\255\010\253\074\255\010\114\000\255\010\010\000\255\010\000\000It's the tunnel leading back where I came\016from.
tunnel
\255\010\077\107\255\010\116\000\255\010\010\000\255\010\000\000Not much left of him.
crushed guard
Guard@@@@@@@@@
Guard@@@@@@@@@
sentry statue
crab

Open questions

What are the long escape sequences before most of the strings? Do I need to bother with them?
What is the \016 escape sequence often instead of the last space?
Do I need to update the string lengths somewhere?
When using the translations, using the in-game save crashes the game. ScummVM state save and load work fine, but they don't reload the translations, (I think?)
Can/should I also translate things like
Code: Select all
```
a1-cageroom-robot-smash
```
which look like headers the game might look for?
When dealing with @-padded strings, how do I know that
Code: Select all
```
guard@@@@@@@@@
```
and
Code: Select all
```
crushed guard
```
are of the same 'group' of fixed-length strings?
Do these translations already exist somewhere? If not, I'd be happy to contribute it once I'm done.
How do I find the button caption offsets in the binary? Translated caption strings are of different length, and one would like to center them again.

hetmes · Post by **hetmes** » Mon Dec 30, 2024 9:05 am

Small progress update. I'm playing through the translated game, and although I haven't completed the game yet, I am already at the submarine now, and have checked I hope most of the interesting translations by hand along the way.

1. I imagine some kind of ID and / or screen position data?
2. Perhaps a line break indicator, although some testing didn't seem to confirm that.
3. Looks like that's not needed.
4. This is no longer an issue, although when too many things change, old saves don't load anymore.
5. Looks like those can be left alone.
6. I've tried to keep strings that are similar and close together of the same length using @-padding, but not rigidly. So far, no apparent issues with memory corruption.
7. Haven't found any, but haven't been looking.
8. I using gdb to debug scummvm C++ source code, but haven't been able to trace the values back to file storage yet; I couldn't find the byte sequences the script reader produces in the ATLANTIS.00X files. I eventually just hardcoded something in the drawVerb function, because I wanted to run the program and get the context from the game to check the translations.

As experience progresses, I do have other questions
9. Is there a way to construct the text graphs from the output of ScummTR? That way, I could just glance over the conversations to gather the tone and exact meaning. Without context, things like "Well, now", "Hold on", "Come on" and many other shorter phrases just become impossible to accurately translate. Also, in my case, Dutch distinguishes formal from informal 'you', and it would make the whole translation so much better if I knew who was talking to whom.

LogicDeLuxe · Post by **LogicDeLuxe** » Wed Jan 01, 2025 1:11 pm

1. are the voice instructions with pointers into the monster.sou. If you're only doing a text translation while keeping the English voice acting, you should leave them as they are.

Tsomi · Post by **Tsomi** » Sat Jan 04, 2025 1:51 am

Hi,

ScummTR maintainer here. Thanks for this very interesting set of questions, I'll try to help you as much as I can.

(Although posting questions here mean that they will be properly indexed by Google, it's probably better to ask your question on the ScummVM Discord server, which has much more visibility and activity than the forums nowadays. Also, for ScummTR itself, opening a new discussion on the project is also recommended.)

In general, regarding ScummTR usage:

Make sure you're using the latest ScummTR release -- some people still use the scummtr.exe program from 2003/2004, but newer releases with some bugfixes were made after 2020
Have a look at its FAQ and manual pages; some common questions are referenced there (the FAQ can be improved, I just need more feedback or contributions for that)
Also have a look at NUTCracker, which is a newer, more-maintained alternative to ScummTR, and which lets you do more SCUMM resources changes than ScummTR/ScummRP/ScummFont. (Currently, it also supports Humongous Entertainment titles, but doesn't support yet all the pre-Monkey 1 CD LucasArts titles that ScummTR supports).

So, now, back to your questions…

1. What are the long escape sequences before most of the strings? Do I need to bother with them?

I imagine some kind of ID and / or screen position data?

As LogicDeLuxe said, in the cases you've shown, they're special options to trigger the audio lines spoken when the text is displayed. You don't need to bother with them, and more importantly you shouldn't change these escape sequences at all, otherwise you risk "breaking" the audio lines.

The basic idea is: "if a line starts with a long series of escape sequences in a talkie title, leave this escape sequence the way it is".

(It's -- a bit -- covered in this part of the FAQ, but I guess I could improve it in some way.)

2. What is the \016 escape sequence often instead of the last space?

Perhaps a line break indicator, although some testing didn't seem to confirm that.

It's a special typography feature which, as far as I know, only exists in the official releases of Indy4. It's a way of having a non-breaking space, basically. It's very useful in some languages (e.g. French), and it can also help you make sure that a newline will never be inserted between two words.

It looks like whoever dealt with the typography in Indy4 disliked "runts".

If you don't care about this, you can just replace the \016 with a plain space character. Or do some texts in-game to see what it does (i.e. write a long sentence, and have Indy read it while being close to the edge of the screen).

On the other hand, if you do like non-breaking spaces, it's possible to import this special character into most of the other SCUMM titles. So far, only French people have asked me so

Speaking of "line break indicator", have a look at the FAQ explanations for the \255\001 (and so on) sequences.

3. Do I need to update the string lengths somewhere?

No, that's the first purpose of ScummTR. Before this tool existed, people would decipher the .LFL files by hand (it's trivial) and edit the strings with a hex editor (here's a very old example of such older translation; it's full of typos, hacks and deep bugs -- ScummTR was created by Hibernatus as a response to this for the ATP team that was created for this translation, AFAIK).

You don't have to care for the string lengths (except for the '@' symbols used for padding objects/actors/verbs when they get renamed; see below). You can't use ScummTR to add or full line, though. (Some translations sometimes require adding new strings, and this becomes more tedious, because it's not implemented in ScummTR itself. Technically, this feature could probably be added, but I'm "only" maintaining it, not really developing it (its original author is not really interested in it anymore either, 20 years later), and I don't have the interest/skills in making any big change to it.)

4. When using the translations, using the in-game save crashes the game. ScummVM state save and load work fine, but they don't reload the translations, (I think?)

Yes, that's expected. That's because ScummTR has to change the resource sizes for the newer string lengths and newer object/actor/verb names, and the saves do not expect this to happen.

This part is covered in this part of the FAQ.

When you're still working on your translation, I'd recommend playing with Boot Params, instead. It's what the original developers of the games used to play the game at various places, while working on it. Boot Params are not affected by the problems that you see with saves.

Once your translation is mostly done, it's probably a better moment to start doing a full play with saves (although they'll break again if you make new changes). And once your translation is completely frozen, of course the saves are not going to cause this problem anymore.

5. Can/should I also translate things like
Code: Select all
a1-cageroom-robot-smash
which look like headers the game might look for?

No, leave them alone. I don't think there's an easy way for ScummTR to recognize them (otherwise we could just hide them). You need to know the context of the script containing these strings, just to be sure.

When in doubt, if you can't even get the string to be displayed in the game, just leave it as-is.

If you want to read the scripts themselves (it's kinda useful when you do a translation, in general), you can try using something like ScummEX, for example. Give it the ATLANTIS.001 file, and then explore the "rooms" contained inside each LFLF you see. Some resources can be decompiled (behind the scenes, it calls descumm.exe), e.g. 'LSCR', 'SCRP', 'EXCD', 'ENCD', 'VERB'. Then, hit the "Decompile Script" button to see what the script looks like.

If you do your ScummTR import/export calls with the '-h' option, you will see this kind of header, at the start of each line:

Code: Select all

[006:LSCR#0200]These books don't look\016familiar.

The first number is the LFL/room number. Then it's the resource type, and then resource number.

This way, when you encounter an unknown string, you know which script needs to be decompiled in order to know a bit more about its context.

(I see you mention GDB, so you're maybe quite familiar with the command-line interface. In that case, you can install scummvm-tools, run scummrp to extract all the resources, and manually run 'descumm -5' on the resource types given above. Or use NUTCracker for this.)

6. When dealing with @-padded strings, how do I know that
Code: Select all
guard@@@@@@@@@
and
Code: Select all
crushed guard
are of the same 'group' of fixed-length strings?

I've tried to keep strings that are similar and close together of the same length using @-padding, but not rigidly. So far, no apparent issues with memory corruption.

One way is to look at the context of the script, as described in the previous answer.

Another way is to not bother padding strings yourself, and just use the recommended `-A ao` option, in your export/import. It will just pad the strings for you. It does so by padding a lot more than required, but unless you want to play your translation on an original DOS machine from 1990 with its very limited memory, it shouldn't matter

Note that ScummVM doesn't care for strings not being properly padded, anyway. The original interpreters do care for this, though. You'll hit runtime script errors, if some strings don't have enough padding (actually, the official French release of Indy4 had such a fatal error…). If you care about being compatible with the original interpreters, I'd suggest doing a full gameplay of your translation with an original interpreter running under DREAMM or DOSBox.

7. Do these translations already exist somewhere? If not, I'd be happy to contribute it once I'm done.

I'm not sure I understand your question, here.

You mean: do other fan translations exist? Yes. For Indy4 in particular, well, it requires quite a lot of work, because it has so much text, so I'm not sure so many of them exist

I'm not really aware of a catalog of all the available translations. They can be added to the ScummVM detection tables, if they're completely done, and if and only if they're NOT distributed as full copies of the games.

8. How do I find the button caption offsets in the binary? Translated caption strings are of different length, and one would like to center them again.

Ah, yeah, that one. It's a hardcoded image. My notes say 'object #1263 in room 98'. You should be able to export/import it with NUTCracker, and some MS Paint usage

. Or grab the resource from the official German/French releases of Indy4, where it has already been redrawn for larger verbs.

9. Is there a way to construct the text graphs from the output of ScummTR? That way, I could just glance over the conversations to gather the tone and exact meaning. Without context, things like "Well, now", "Hold on", "Come on" and many other shorter phrases just become impossible to accurately translate. Also, in my case, Dutch distinguishes formal from informal 'you', and it would make the whole translation so much better if I knew who was talking to whom.

Use the '-h' option and follow the procedure above to read the strings within their script context. It helps a lot.

And have fun with your translation

hetmes · Post by **hetmes** » Sat Jan 04, 2025 10:27 am

Thank you so much for your insightful replies. On top of that, it's good to know I wasn't talking into a void.

As it just so happens, I believe I have just now arrived at an acceptable state of the Dutch translations. I am able to play the game until the end credits and I would like to move forward with making the translations available.

I will continue this discussion in the ScummTR forum and/or Discord.

Tsomi · Post by **Tsomi** » Sat Jan 04, 2025 1:21 pm

Sure! Yeah, the ScummVM forums are getting old, and with Github and Discord sending proper notifications (where you can only subscribe to the perimeter you're interested in), the forums tend to be overlooked, today. I have a look at them from time to time, but it may take weeks before some posts get noticed, unfortunately.

Anyway, I don't have as much availability for ScummTR as I did when I rebooted the project, back in 2020-2021. But don't hesitate asking questions, I'll try to help you as much as I can. Your questions are also a very good feedback, to understand what may be missing from the tool or its documentation. So I hope to be able to find the time to focus on them again, in order to make a better full translation guide for ScummTR.

The tool has there for a long time, but there are many unwritten rules for it, indeed. (Actually, one of the reasons why I asked its author whether he still had its source code, back in late 2019 or 2020, was to understand more about its usage, by having a look at the code

)

hetmes · Post by **hetmes** » Thu Jan 16, 2025 2:38 pm

Another update, as I would like this knowledge to be searchable.

1.) Don't touch them
2.) Non-Breakable SPace character hint
3.) No, I don't, at least when playing on ScummVM
4.) This never occurred again, possible it was an incorrect use of ScummTR source translations
5.) I'm avoiding them at sight, but there's no way to identify which ones are actually used as display strings
6.) Irrelevant for ScummVM purposes, it seems.
7.) Thinking about how to contribute. There's a initiative to use Weblate to kick-start a collaborative effort, but I don't particularly like the rigidity of the approach. Longer story.
8.) The ATLANTIS.001 file is XOR-encoded with a single byte value, which is why I couldn't find the values in the binary file initially. Had to go deeper into the code with gdb to find that one. Then used some scripting and a hex editor to fix it. Rerunning ScummTR maintains the edited values. @Tsomi, I was just looking for the text rendering offset, not actually making the boxes wider or anything.

Other issues I'm working on:
9.) I might use the room information or whatever ScummTR provides for extracted strings to group them together for translation, and that might also allow the automatic addition of a screenshot, which I think would greatly help lower the bar to contribute to any sort of collaborative translation system. But using that would mean adapting some existing project to display images with the translation UI. Maybe I'll look into that for the next translation project.
10.) The 'vent hydrogen' and 'drop ballast' strings are not in the ScummTR string, and are possibly background images, and I'm looking into Nutcracker to edit those as well. 11.) Editing the font files to allow various other characters that are commonly used in the Dutch language, mainly ë and ï. Also using Nutcracker for this.

Fate of Atlantis ScummTR translation details

Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details

Re: Fate of Atlantis ScummTR translation details