Well, I bow to your greater knowledge of the DS, but just before I abandon the idea as unworkable, can you set me straight about the following?agentq wrote:No, it's not possible. The software scaler currently implemented scales from 320x200 to 256x200, and uses a large portion of the CPU. This is about as fast as it can be, and it doesn't have to cope with scaling in two directions.
640x480 is four times as much data, not something that's worth attempting, IMO. There's also the problem of fitting all the data into memory. We only have 4Mb of general purpose memory.
Don't forget, the WinCE devices have much more memory and 300Mhz+ CPUs instead of the DSes 66Mhz/33Mhz ones.
The DS contains, as you say, 4Mb of general purpose memory, but also (AIUI) 512K of VRAM.
512K of VRAM is enough to hold a 640x480x1 byte screen, and 2 256x192 screens (one for the scaled version, and one for palette looked up section of the 640x480 one).
Surely that means that the larger screen can be accomodated without having to dip into general memory?
My limited knowledge in this area gleaned from skimming the DS docs suggests that we can rearrange the 4 128k VRAM banks into 1 large 512k block of memory.
We arrange one 256x192x2 screen to start at offset 0 within this block, and follow it by the 640x480x1 screen. After this we put the last 256x192x2 screen.
This means that each of the 256x192x2 screens would be entirely contained within a single 128K bank of the VRAM, and the 640x480x1 one would be in a single continuous lump, (spanning all 4) making it easy for us to access it.
(It's entirely possible that there are limitations that I don't know about that I've overlooked here - in which case I apologise).
As for speed... the latest version of the CPU scaler takes 10.5ms - that's reading from 320x256 and writing to 255x192.
(Actually, why are we reading from 320x256? Surely for 320x200 games that's 56 lines too many? Maybe we should make the number of lines a parameter? Could save 2ms?)
The 640x480 one would need to read from just under 4 times as much data, and write to the same amount. A very rough estimage would therefore expect that to be around 4 times as slow? (Maybe less, as the storing overhead is the same).
That would suggest a time of 40ms or so. That's still enough to do 25fps isn't it? (ignoring all other factors, which is clearly wrong, but my experience on other ports suggests that the graphics blitting is the highest computational hit)
Even if we really do get a huge slowdown for things like intros (where the number of fps really matters), does this matter as much in actual gameplay?
As I say, I'm not trying to be a clever ass here and claim that I know better than you (cos clearly your experience and work on this port makes you a much better judge of what is possible and what isn't), but I'd like to understand the issues...
Robin