I spent the last few days on working on the sprite loading / display.
My code is 90% complete but of course, completly untested for now.
It will probably take sometime before I post new pictures including those.
The sprite code amount for 20% of the whole design, and 70% of the generated logic in the FPGA.
Once I have finished that, I have a lot of small details and things to finish.
The TODO list is maintained inside the SVN anyway.
2010/05/04
2010/05/01
Changes...
Today, I managed to fix the following bugs :
- Color blending overflow detection had a problem with "half" color, this is now fixed and color looks good.
- Tested mode7 screen from game : now coordinate system is fixed.
Some timing adjustment may be necessary but everything seems to be fine otherwise.
I also decided to write a prototype of the sprite system.
To decide the priority of the sprites when drawing ONE pixel, I must have access simultaneously to the 34 tiles sprite cache for the current pixel and do a priority check on valid pixel and take the pixel with the highest priority.
This result in having the following information per sprite :
- Palette
- Priority
- Start X
- 16x2 bit of sprite info. (8 pixel @ 4 BPP for 1 line)
As the sprite unit are also loaded while the display is performed, I do bank switching and have to store TWICE that amount of memory.
One possible trick is to do the LOADING of the sprite into a real MEMORY and not flip/flop
and then do the transfer internally during the H-Blank.
(I would need 34 cycles, that seems possible)
Anyway for now the sprite system is just HUGE.
It is not complete yet (the state machine for the tile loading is not done yet but it will not be that huge) but I already reach 2835 slices of my Virtex4. It is like 18% of my SX35 Virtex just for the sprite system...
Even if I manage later to create a different architecture, I will only reduce that less than 50%, most likely 40% even if I remove 50% of the internal registers.
Basically, once the PPU is finished, consider the sprite system to consume around 40% of the PPU design.
In the best case, I could reduce the sprite unit to X / BPP0123.
Use RAM do to the transfer during HBlank, use another ram to store visible pal/prio of sprites.
Because RAM need one cycle for the access compare to my preloaded registers,
I would need to read ahead of one pixel the sprite sooner than the BGs.
Then I may be lucky to reduce this design @40% its original size.
It is something I may do later on, when everything is working fine...
But for now, I prefer an easy design, no timing issue.
Pixel / Priority => Ouput.
- Color blending overflow detection had a problem with "half" color, this is now fixed and color looks good.
- Tested mode7 screen from game : now coordinate system is fixed.
Some timing adjustment may be necessary but everything seems to be fine otherwise.
I also decided to write a prototype of the sprite system.
To decide the priority of the sprites when drawing ONE pixel, I must have access simultaneously to the 34 tiles sprite cache for the current pixel and do a priority check on valid pixel and take the pixel with the highest priority.
This result in having the following information per sprite :
- Palette
- Priority
- Start X
- 16x2 bit of sprite info. (8 pixel @ 4 BPP for 1 line)
As the sprite unit are also loaded while the display is performed, I do bank switching and have to store TWICE that amount of memory.
One possible trick is to do the LOADING of the sprite into a real MEMORY and not flip/flop
and then do the transfer internally during the H-Blank.
(I would need 34 cycles, that seems possible)
Anyway for now the sprite system is just HUGE.
It is not complete yet (the state machine for the tile loading is not done yet but it will not be that huge) but I already reach 2835 slices of my Virtex4. It is like 18% of my SX35 Virtex just for the sprite system...
Even if I manage later to create a different architecture, I will only reduce that less than 50%, most likely 40% even if I remove 50% of the internal registers.
Basically, once the PPU is finished, consider the sprite system to consume around 40% of the PPU design.
In the best case, I could reduce the sprite unit to X / BPP0123.
Use RAM do to the transfer during HBlank, use another ram to store visible pal/prio of sprites.
Because RAM need one cycle for the access compare to my preloaded registers,
I would need to read ahead of one pixel the sprite sooner than the BGs.
Then I may be lucky to reduce this design @40% its original size.
It is something I may do later on, when everything is working fine...
But for now, I prefer an easy design, no timing issue.
Pixel / Priority => Ouput.
2010/04/30
Mode 0 / Mode 3 / Mode 4 Tests.
I started to check various ROMs.
Mode 0 seems to be OK.
Then checked mode 3:
This is the screen shot from the emulator.
And here is what I get :
The difference is that I do not have sprites yet, so I can not display the text.
And it also uses HDMA for color rasters, which have dont have yet either.
So for what has been implemented so far, everything is good.
Mode 4 :
I have a glitch, it seems that I read the tile as if it was in mode 16 pixel tiles,
where actually it must access in 8 pixel tile.
Too tired now, I will check that tomorrow. I remember the specs saying that for this mode it MUST be 16 pixels... may be mode 5 ? Too sleepy anyway... Let's save some work for tomorrow anyway :-)
Mode 0 seems to be OK.
Then checked mode 3:
This is the screen shot from the emulator.
And here is what I get :
The difference is that I do not have sprites yet, so I can not display the text.
And it also uses HDMA for color rasters, which have dont have yet either.
So for what has been implemented so far, everything is good.
Mode 4 :
I have a glitch, it seems that I read the tile as if it was in mode 16 pixel tiles,
where actually it must access in 8 pixel tile.
Too tired now, I will check that tomorrow. I remember the specs saying that for this mode it MUST be 16 pixels... may be mode 5 ? Too sleepy anyway... Let's save some work for tomorrow anyway :-)
The infamous mode 7.
Ok, I got the mode7 to work today.
There is still some stuff to verify.
The mode7 flip H/W is not working, specification using mosaic and interlace not done yet.
But for most games, it should work as is anyway...
Of course I put the remaining stuff on the TODO list. I do not forget it.
Anyway I posted the pics to show how I tested :-)
2010/04/24
Snes on FPGA Information.
Some of my friends asked me how I could get those screens without having a CPU yet...
Yes, he is right.
What I did is that I wrote a DLL which is plugged into Snes9x.
That DLL tracks every register write to the PPU.
When the "screenshot" feature of Snes9x is called, I also dump the VRAM/CGRAM and all the registers directly into a VHDL program, then I recompile the complete chip and test that particular state.
It is an annoying and lengthy process, as I recompile the chip and resend it to the board,
it takes between 2 to 3 minutes to update a new design, when you also add the time to copy the files from some place to another etc... It is very annoying.
So now I am working on using my serial port and have my DLL do a direct transfer of the VRAM, register states to the chip without recompiling the design.
That is going to be a big plus, it will shorten the debugging time, but also allow me to modify the registers in real time and see if everything is fine.
PS : I forgot to say in my previous post that mosaic is also working nicely.
Yes, he is right.
What I did is that I wrote a DLL which is plugged into Snes9x.
That DLL tracks every register write to the PPU.
When the "screenshot" feature of Snes9x is called, I also dump the VRAM/CGRAM and all the registers directly into a VHDL program, then I recompile the complete chip and test that particular state.
It is an annoying and lengthy process, as I recompile the chip and resend it to the board,
it takes between 2 to 3 minutes to update a new design, when you also add the time to copy the files from some place to another etc... It is very annoying.
So now I am working on using my serial port and have my DLL do a direct transfer of the VRAM, register states to the chip without recompiling the design.
That is going to be a big plus, it will shorten the debugging time, but also allow me to modify the registers in real time and see if everything is fine.
PS : I forgot to say in my previous post that mosaic is also working nicely.
New screens.
Last time, I did hard code the priority of the BGs,
but now everything is fine.
- The super mario screen now uses MAIN / SUB to display.
- Axelay shadow effect with the space ship at the beginning is OK too.
- Fixed some timing glitches.
- Fixed when PPU is disabled.
- Fixed bug for sprites (palette 0..3 do not participate in Main/Sub color math)
So here are a few screens now.
- Konami logo booted from Axelay
- Village from FF6
- Fortress from FF6.
I think have things to fix of course, but things are getting nicer little by little. :-)
Now it finally start to look like a Snes ;-)
2010/04/18
VHDL Code for the Snes.
I decided to post my code on Google Code repository to allow me to work easily with SVN.
Here is the URL : http://code.google.com/p/fpgasnes/
If anybody is interested to keep in touch with me :
I have a hotmail.com account and you know my nickname :-) so it is :
[my nickname] @ hotmail.com
No MSN please, I am quite busy and do not want to be disturbed at work.
Here is the URL : http://code.google.com/p/fpgasnes/
If anybody is interested to keep in touch with me :
I have a hotmail.com account and you know my nickname :-) so it is :
[my nickname] @ hotmail.com
No MSN please, I am quite busy and do not want to be disturbed at work.
Something to show...
Well, I bet nobody comes on that blog still.
But anyway I am going to publish the results of my work.
I have complete what I think is a good beginning of the Snes PPU chip.
It has all the logic/structure for everything except sprites.
Of course, some inner part are still untested but structure is inside and timing is thought at least.
Now what been proven work on my FPGA board.
- BG 1/2/3/4 memory access for all graphic mode done.
- Of course support for various bitmap format of each mode.
- Support of tile flip 8/16 pixels.
- Support of map of 32/64 tiles mode.
Out of three bugs I was looking for, two were related to SPEC miss.
(Document are not accurate in two places)
My own bug were :
- I had to disable the register writing side of the PPU that went berserk and destroyed the test data.
- I use a modified version of Snes9x with a custom DLL to trap all the call to the registers and dump memory into VHDL tables to be able to take a "screenshot" of the graphics chip state and compile it in my design to verify if the chip does the rendering correctly.
(Big-mega-thanks to ThunderZ for the support, and also future BSnes patching, being too much busy with other stuff)
My DLL for-loop that was dumping the table had A BUG. I lost two weeks because of a LOOP in C++ where like 4000 lines of VHDL CODE WAS CORRECT RIGHT AWAY WITHOUT ANY SIMULATOR !!! Shit... gotta kick my ass sometime.
What remains soon :
- Mode7 need to be finished.
- Mosaic is half-baked.
- Verify and debug window / main-sub variation.
What remains later :
- HiRes
- Interlace (need to check what that means... the chip already actually work at double resolution)
- Sprites
Here's some pix for the curious :
Trying to debug... GARBAGE. Divided the screen in 32 pixel height block and value reflected internal register value.
After finding that I was reading garbage and that my register logic cpu-side was going berserk and I fixed it, I started to have something like that after a few hours of debug.
And after a whole week-end, it gives something like that.
Still my main-sub unit not beeing ready, I hard coded the priority just to display the 3 BGs correctly.
2010/04/04
Snes on FPGA.
I am working since a few weeks (7 weeks) on a Snes on FPGA.
For now I am focusing only on the graphic chip and the chip implementation reaches now around 6k lines of VHDL code and I started the debug a few days ago.
Let's see if I can start to have something consistent to publish within a few feeks from now.
(ie screenshot would be nice)
Cheers.
For now I am focusing only on the graphic chip and the chip implementation reaches now around 6k lines of VHDL code and I started the debug a few days ago.
Let's see if I can start to have something consistent to publish within a few feeks from now.
(ie screenshot would be nice)
Cheers.
2008/08/30
ETC1 Texture decoder.
I try to publish at least one post per week.
Now, post something REALLY interesting and valuable
each week is actually quite some work.
As I do not have so much time, I will publish about
something I have done some weeks ago...
It is about a hardware decoder for ETC1 texture compression
format.
The decoder work as the following :
- Input : Take a 64 bit data chunk with the compressed data.
- Input : adress of the pixel to decode in a 4x4 pixel block.
- Output : an RGB value, 888 bit format.
- Output : optionnally an overflow detector for ETC2 extension.
The decoder follow the description given by the original paper
that published this encoding format.
Here is also the CORRECT description of the 64 bit chunk
format as it was not described in the paper.
Finally here is the source code [TODO link] of the texture decoder in
VHDL language, it is quite small and could be usefull for a FPGA project.
One must note that the logic has no register, thus allowing to directly
perform the translation directly during the texture lookup transparently,
and avoid a "pre-decompression" stage during texture fetching.
I would really be glad to put my hand over a ETC2 software encoder.
If such, then I would write a VHDL ETC2 decoder and look at the difference
in term of cost and performance over a ETC1 decoder.
By the way, the ETC1 decoder has been FPGA proven, decoding
faithfully the "elina" picture given as a sample with the ETC1 encoder.
Enjoy !
Now, post something REALLY interesting and valuable
each week is actually quite some work.
As I do not have so much time, I will publish about
something I have done some weeks ago...
It is about a hardware decoder for ETC1 texture compression
format.
The decoder work as the following :
- Input : Take a 64 bit data chunk with the compressed data.
- Input : adress of the pixel to decode in a 4x4 pixel block.
- Output : an RGB value, 888 bit format.
- Output : optionnally an overflow detector for ETC2 extension.
The decoder follow the description given by the original paper
that published this encoding format.
Here is also the CORRECT description of the 64 bit chunk
format as it was not described in the paper.
Finally here is the source code [TODO link] of the texture decoder in
VHDL language, it is quite small and could be usefull for a FPGA project.
One must note that the logic has no register, thus allowing to directly
perform the translation directly during the texture lookup transparently,
and avoid a "pre-decompression" stage during texture fetching.
I would really be glad to put my hand over a ETC2 software encoder.
If such, then I would write a VHDL ETC2 decoder and look at the difference
in term of cost and performance over a ETC1 decoder.
By the way, the ETC1 decoder has been FPGA proven, decoding
faithfully the "elina" picture given as a sample with the ETC1 encoder.
Enjoy !
2008/08/22
Shopping !!!
I promised myself to update the blog at least once a week...
Well, I could not resist after one day :-P.
The thing that I went to Akihabara this morning
to buy two LCD screens. Really cheap one !
300 Yen (roughly less than 2 euro) for a 400x96 pixel
RGB 666 display.
(If you have read the previous article,
things should start to make sense to you)
Actually I had to buy a lot of other stuff too and
ended spending like 8000 yens. Anyway, here is
the information you can find about those LCD screens
(page is in japanese)
The thing is that the connector is a 0.5mm pitch w/ 36 pins.
That is the kind of stuff that requires to be
a true masochist to connect...
I found by miracle the following flexicable,
that ALSO perform the conversion to more "hobby" standard
connectors. (770 yen per piece)
The cool stuff about this is that you can take a cutter
and adapt it to your size. That is REALLY smart and
avoid the hassle of :
- Messing directly with the LCD connector
by soldering directly with thin wires.
or
- By a normal flexicable
-> connector on the other side
-> solder the connector
-> etc...
Cut, solder the other part with standard pitch, FINISHED.
Here is the picture of this gift from the gods :
The good, the bad and the ugly. (in multicolor)
In this article, I will talk about displaying
more colors than your display can really do
using your FPGA.
Because your FPGA will allow you to perform
a per-pixel computation, we are able to do
some tricks that software / microcontroller
would find troublesome to do.
You want fun ? You are going to have it !
Here is the article file.
ART001 "Augmenting perceived color depth with simple hardware logic."
Here is the associated source code.(VHDL, unix formatting)
COD001
PS : I bet this kind of trick was actually done
already on Amiga / Atari ST by the demoscene for fixed images.
(Well I did it as a test program on my ST at least)
more colors than your display can really do
using your FPGA.
Because your FPGA will allow you to perform
a per-pixel computation, we are able to do
some tricks that software / microcontroller
would find troublesome to do.
You want fun ? You are going to have it !
Here is the article file.
ART001 "Augmenting perceived color depth with simple hardware logic."
Here is the associated source code.(VHDL, unix formatting)
COD001
PS : I bet this kind of trick was actually done
already on Amiga / Atari ST by the demoscene for fixed images.
(Well I did it as a test program on my ST at least)
2008/08/21
Starting Again...
2006/10/13
Decided to make many technical articles on my blog...
After 8 months of sleep on this blog, start my own web page and so on...
I have decided to build a lot of interesting(for me at least) technical articles on my accumulated knowledge, the kind of article I would like to see more...
So be ready soon for my weekly/daily publishing.
Because there is a LOT and LOT of knowledge to talk about.
Laxer3A
I have decided to build a lot of interesting(for me at least) technical articles on my accumulated knowledge, the kind of article I would like to see more...
So be ready soon for my weekly/daily publishing.
Because there is a LOT and LOT of knowledge to talk about.
Laxer3A
2005/12/26
Blog is started... New mobile phones too.
This blog is most likely to talk about various subjects...
Anyway, here is the subject I would like to talk about :
- My hardware VHDL projects.
- Interesting link or ideas I saw.
- Interesting project at my job.
- My personnal software projects.
- Discussion about computer graphics and rendering.
For today, I would like to announce a great news. I have spent multiple years doing demomaking on Atari ST (a really cool machine). My passion for "scratching the pixel" is still here.
I was lucky that in my current company, I got the opportunity to convince a huge japanese operator that we could do better than phone makers implementation for graphics.
The specs of the latest 3G mobile phones are just impressive, look at the TI Omap 2600 chipset.
233 or 266 Mhz is a common standard, and rumour of 133 or 150 Mhz bus with DDR ram is most likely true.
Compare to their software implementation, our solution is just a killer : roughly an average of x3 in performance when data are matching our algorithms.
Some example : 240x240 fillRect @ 3500x/second.
Full screen drawn 4 times and still @ 60 fps.
Alpha fill Rect : 240x240 surface @ 330x/sec.
Now, I can rest in peace, we did quite a good shot.
Anyway, here is the subject I would like to talk about :
- My hardware VHDL projects.
- Interesting link or ideas I saw.
- Interesting project at my job.
- My personnal software projects.
- Discussion about computer graphics and rendering.
For today, I would like to announce a great news. I have spent multiple years doing demomaking on Atari ST (a really cool machine). My passion for "scratching the pixel" is still here.
I was lucky that in my current company, I got the opportunity to convince a huge japanese operator that we could do better than phone makers implementation for graphics.
The specs of the latest 3G mobile phones are just impressive, look at the TI Omap 2600 chipset.
233 or 266 Mhz is a common standard, and rumour of 133 or 150 Mhz bus with DDR ram is most likely true.
Compare to their software implementation, our solution is just a killer : roughly an average of x3 in performance when data are matching our algorithms.
Some example : 240x240 fillRect @ 3500x/second.
Full screen drawn 4 times and still @ 60 fps.
Alpha fill Rect : 240x240 surface @ 330x/sec.
Now, I can rest in peace, we did quite a good shot.
Subscribe to:
Posts (Atom)