* fix(mixed-types): use f32 for precision and update the shared memory calculation logic for f32
* fix(unary): correct the gelu, gelu quick and gelu erf functions
* fix(flash-attn-tile): fix the hardcode v type
* fix(flash_attn): fix tile path
* fix: pass editorconfig and address the type conflicts
* fix: remove reduant pipeline keys
* fix: remove inline min/max group size functions and revert the flash attn path order
* fix: use clamp to avoid NaN for GELU
* fix: use the right range for exp, 80 is safer for f32 exp