logs archiveBotHelp.net / Freenode / #3dsdev / 2015 / September / 3 / 4
profi200
gcc and g++ have the same option as the fortran compiker.
*compiler
https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options.html See under -Ofast
yuriks
I love how -Ofast is actually a thinkg, lol
like, -O4 wasn't enough, so let's go -Ofast
xyz
it even says these options are fortran specific
yuriks
hmm, yeah
I don't think it has any effect on C
profi200
Never saw -O4 anywhere.
yuriks
the only way it makes sense anyway is if you're using VLAs, which are the black sheep of C99 features :P
xyz
all local arrays are already on the stack
yuriks
profi200: yes, I meant "-O4 doesn't sound cool enough, let's use -Ofast"
wow, fortran has some really trippy optimization options:
-faggressive-function-elimination
Functions with identical argument lists are eliminated within statements, regardless of whether these functions are marked PURE or not. For example, in
profi200
Fortran kicks both C and C++'s ass in terms of speed. No idea how usable the language is.
yuriks
the power of language semantics
endrift
So -O4 used to mean -O3 -flto I think
at least in clang
it doesn't anymore afaik
I might just be imagining that but I swear it was a thing a few years ago
profi200
https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=98354 For some reason the gcc used for RPi has -O4 (scroll down).
yuriks
I bet it's just clamping to 3
endrift
^
you can type -O9 if you want, but it still clamps to -)3
-O3
profi200
I wonder why people use that then xD
"4 is higher as 3 so it must be better!"
Lectem
fortran has too many levels
profi200
I would still try to get any form of pointer aliasing out of mGBA if it is not needed. The compiler apparently can generate good code if if you load all values you want to change first, modify them and write them back in separate steps. That will eliminate some ldr instructions which slow it down.
*-if
endrift
well yes
pointer aliasing and stuff leads to garbage like that
but the heaviest functions in the emulator are the interpreter, which has no type punning, and the software renderers, which does do a bit of pointer aliasing but it's just a lot of work regardless
if I could speed up the software renderer, that would be amazing
but it's pretty heavily optimized at this point
profi200
Your emu seriously needs such optimizations if it should run well on old 3DS at some point.
endrift
I never expected it to
like
I'm amazed it runs this well on the N3DS
the port was stalled for a long time because I never thought it'd be fast enough
I have a branch where I offload rendering onto a second thread and on PC it speeds things up ~50% at times
like, holy sh*t that's a lot
but I don't know if it'd translate to other platforms
yuriks
you might get a similar speed up on the n3DS if you use the second core
endrift
that would be great, but I doubt it would translate to the O3DS
yuriks
o3DS you only get 30% of the core time (vs. 80% on the n3DS) so yeah, not that much
should give more headroom on the n3DS tho
endrift
it still might give a speedup on the O3DS
profi200
You will have synchronization overhead however.
yuriks
did anyone write a decent (or heck, *any*) set of userland synchronization primitives?
profi200
And possible sync issues for audio and video.
yuriks
because if you're doing multi-core, and using exclusively the kernel stuff, yeah, perf is going to be garbage :/
profi200
blargSnes had audio stuff on the separate core but it caused more problems as it helped.
endrift
right, the way the FIFO for the PPU is set up on this thread, the only synchronization that's done is: waking up the second thread if it's idling, and waiting at the end of a frame
the FIFO itself is lock-free
but the second thread needs to know if it has stuff in it again
but the problem is the wakeup might be expensive
yuriks
just leave it spinning :^)
endrift
anyway I haven't ported the threading abstraction to the 3DS yet
I still need to do that
yuriks
but well, that should be totally doable
endrift
LWP is it?
yuriks
hm?
endrift
the threading system
profi200
The 3DS has events but requires yet another system call just to signal an event.
yuriks
endrift: what does LWP stand for?
endrift
I have no idea
maybe I'm thinking of on gekko
yuriks
"Light-weight process"
yeah I think you are
the 3DS has kernel-level threads
endrift
yikes
oh well, you need that for SMP
ABigDeal
lwp is the thread system for libogc
yuriks
cooperative on the appcore, though the syscore is preemptive I think
profi200: you should only be using those for IPC syncrhonization
same for the other ones
Nintendo uses atomics for everything, and AddressArbiters if it needs to sleep a thread
(AddressArbiters are similar to Linux futexes as far as I can tell, but I don't know much about how to use them)
profi200
No idea what AdressArbiter actually is or how it works :>
yuriks
they allow a thread to block until a condition is satisfied and another thread notifies that
so for a semaphore you try to decrement using atomics and if it underflows then you WaitSynch on the arbiter instead
next thread, if it increases the semaphore count from 0, will notify the arbiter, which will wake up the other thread
endrift
oh no, there are no condition variables ;_;
yuriks
endrift: you can implement them with ^
endrift
I know how they work
I just wanted fewer kernel calls
yuriks
?
endrift
the way the FIFO works uses condition variables, which are emulated with mutexes and semaphores
I don't have a raw semaphore in the threading abstraction
maybe I should add one
yuriks
you don't need to emulate them
again, using the kernel mutexes and semaphores is a bad idea
endrift
hm?
oh, using atomics
yuriks
your mutex should be done with atomics + the arbiter, and the condvar can blocks threads using AddressArbiters
endrift
how do I do this on the 3DS though
yuriks
will need some figuring out, you might be the first person to do decent threading on 3ds homebrew lol
profi200
Is there any example for using AdressArbiter?
« prev 1 2 3 4 5 6 next »