[amd64] Resurrect inlined fast tls
By default all platforms will call into native getters/setters whenever they need to get access tls value. On certain platforms we can try to be faster than this and avoid the call. We call this fast tls and each platform defines its own way to achieve this. Fast tls should normally be inlined, otherwise there is little point to doing anything else in the first place (on linux, __thread access is 2-3 instructions, on mac pthread_getspecific is 2 instructions, other platforms also having decent implementations). For this, a platform has to define MONO_ARCH_HAVE_FAST_TLS, and provide alternative getters/setters for a MonoTlsKey. In order to have fast getter/setters, the platform has to declare a way to fetch an internal offset (MONO_THREAD_VAR_OFFSET) which is stored in the tls module, and in the arch specific file probe the system to see if we can use the offset initialized here. If these run-time checks don't succeed we just use the fallbacks.
In case we would wish to provide fast inlined tls for aot code, we would need to be sure that, at run-time, these two platform checks would never fail otherwise the tls getter/setters that we emitted would not work. Normally, there is little incentive to support this since tls access is most common in wrappers and managed allocators, both of which are not aot-ed by default. So far, we never supported inlined fast tls on full-aot systems.
15 files changed: