Transactional memory

In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. Transactional memory systems provide high-level abstraction as an alternative to low-level thread synchronization. This abstraction allows for coordination between concurrent reads and writes of shared data in parallel systems.^[1]

Motivation

Atomicity between two parallel transactions with a conflict

In concurrent programming, synchronization is required when parallel threads attempt to access a shared resource. Low-level thread synchronization constructs such as locks are pessimistic and prohibit threads that are outside a critical section from running the code protected by the critical section. The process of applying and releasing locks often functions as an additional overhead in workloads with little conflict among threads. Transactional memory provides optimistic concurrency control by allowing threads to run in parallel with minimal interference.^[2] The goal of transactional memory systems is to transparently support regions of code marked as transactions by enforcing atomicity, consistency and isolation.

A transaction is a collection of operations that can execute and commit changes as long as a conflict is not present. When a conflict is detected, a transaction will revert to its initial state (prior to any changes) and will rerun until all conflicts are removed. Before a successful commit, the outcome of any operation is purely speculative inside a transaction. In contrast to lock-based synchronization where operations are serialized to prevent data corruption, transactions allow for additional parallelism as long as few operations attempt to modify a shared resource. Since the programmer is not responsible for explicitly identifying locks or the order in which they are acquired, programs that utilize transactional memory cannot produce a deadlock.^[2]

With these constructs in place, transactional memory provides a high-level programming abstraction by allowing programmers to enclose their methods within transactional blocks. Correct implementations ensure that data cannot be shared between threads without going through a transaction and produce a serializable outcome. For example, code can be written as:

def transfer_money(from_account, to_account, amount):
    """Transfer money from one account to another."""
    with transaction():
        from_account.balance -= amount
        to_account.balance   += amount

In the code, the block defined by "transaction" is guaranteed atomicity, consistency and isolation by the underlying transactional memory implementation and is transparent to the programmer. The variables within the transaction are protected from external conflicts, ensuring that either the correct amount is transferred or no action is taken at all. Note that concurrency related bugs are still possible in programs that use a large number of transactions, especially in software implementations where the library provided by the language is unable to enforce correct use. Bugs introduced through transactions can often be difficult to debug since breakpoints cannot be placed within a transaction.^[2]

Transactional memory is limited in that it requires a shared-memory abstraction. Although transactional memory programs cannot produce a deadlock, programs may still suffer from a livelock or resource starvation. For example, longer transactions may repeatedly revert in response to multiple smaller transactions, wasting both time and energy.^[2]

Hardware vs. software

Hardware transactional memory using read and write bits

The abstraction of atomicity in transactional memory requires a hardware mechanism to detect conflicts and undo any changes made to shared data.^[3] Hardware transactional memory systems may comprise modifications in processors, cache and bus protocol to support transactions.^[4]^[5]^[6]^[7]^[8] Speculative values in a transaction must be buffered and remain unseen by other threads until commit time. Large buffers are used to store speculative values while avoiding write propagation through the underlying cache coherence protocol. Traditionally, buffers have been implemented using different structures within the memory hierarchy such as store queues or caches. Buffers further away from the processor, such as the L2 cache, can hold more speculative values (up to a few megabytes). The optimal size of a buffer is still under debate due to the limited use of transactions in commercial programs.^[3] In a cache implementation, the cache lines are generally augmented with read and write bits. When the hardware controller receives a request, the controller uses these bits to detect a conflict. If a serializability conflict is detected from a parallel transaction, then the speculative values are discarded. When caches are used, the system may introduce the risk of false conflicts due to the use of cache line granularity.^[3] Load-link/store-conditional (LL/SC) offered by many RISC processors can be viewed as the most basic transactional memory support; however, LL/SC usually operates on data that is the size of a native machine word, so only single-word transactions are supported.^[4] Although hardware transactional memory provides maximal performance compared to software alternatives, limited use has been seen at this time.

Software transactional memory provides transactional memory semantics in a software runtime library or the programming language,^[9] and requires minimal hardware support (typically an atomic compare and swap operation, or equivalent). As the downside, software implementations usually come with a performance penalty, when compared to hardware solutions. Hardware acceleration can reduce some of the overheads associated with software transactional memory.

Owing to the more limited nature of hardware transactional memory (in current implementations), software using it may require fairly extensive tuning to fully benefit from it. For example, the dynamic memory allocator may have a significant influence on performance and likewise structure padding may affect performance (owing to cache alignment and false sharing issues); in the context of a virtual machine, various background threads may cause unexpected transaction aborts.^[10]

History

One of the earliest implementations of transactional memory was the gated store buffer used in Transmeta's Crusoe and Efficeon processors. However, this was only used to facilitate speculative optimizations for binary translation, rather than any form of speculative multithreading, or exposing it directly to programmers. Azul Systems also implemented hardware transactional memory to accelerate their Java appliances, but this was similarly hidden from outsiders.^[11]

Sun Microsystems implemented hardware transactional memory and a limited form of speculative multithreading in its high-end Rock processor. This implementation proved that it could be used for lock elision and more complex hybrid transactional memory systems, where transactions are handled with a combination of hardware and software. The Rock processor was canceled in 2009, just before the acquisition by Oracle; while the actual products were never released, a number of prototype systems were available to researchers.^[11]

In 2009, AMD proposed the Advanced Synchronization Facility (ASF), a set of x86 extensions that provide a very limited form of hardware transactional memory support. The goal was to provide hardware primitives that could be used for higher-level synchronization, such as software transactional memory or lock-free algorithms. However, AMD has not announced whether ASF will be used in products, and if so, in what timeframe.^[11]

More recently, IBM announced in 2011 that Blue Gene/Q had hardware support for both transactional memory and speculative multithreading. The transactional memory could be configured in two modes; the first is an unordered and single-version mode, where a write from one transaction causes a conflict with any transactions reading the same memory address. The second mode is for speculative multithreading, providing an ordered, multi-versioned transactional memory. Speculative threads can have different versions of the same memory address, and hardware implementation keeps track of the age for each thread. The younger threads can access data from older threads (but not the other way around), and writes to the same address are based on the thread order. In some cases, dependencies between threads can cause the younger versions to abort.^[11]

Intel's Transactional Synchronization Extensions (TSX) is available in some of the Skylake processors. It was earlier implemented in Haswell and Broadwell processors as well, but the implementations turned out both times to be defective and support for TSX was disabled. The TSX specification describes the transactional memory API for use by software developers, but withholds details on technical implementation.^[11] ARM architecture has a similar extension.^[12]

As of GCC 4.7, an experimental library for transactional memory is available which utilizes a hybrid implementation. The PyPy variant of Python also introduces transactional memory to the language.

Available implementations

Hardware:
- Arm Transactional Memory Extension (TME)^[13]
- Blue Gene/Q processor from IBM (Sequoia supercomputer)^[14]
- IBM zEnterprise EC12, the first commercial server to include transactional memory processor instructions
- Intel's Transactional Synchronization Extensions (TSX), available in select Haswell-based processors and newer until be removed in Comet Lake
- IBM POWER8 and 9, removed in Power10 (Power ISA v.3.1)^[15]^[16]^[17]
- Rock processor (canceled by Oracle)
Software:
- Vega 2 from Azul Systems^[18]
- STM Monad in the Glasgow Haskell Compiler^[19]
- STMX in Common Lisp^[20]
- Refs in Clojure
- gcc 4.7+ for C/C++^[21]^[22]^[23]^[24]
- PyPy^[25]
- Part of the picotm Transaction Framework for C^[26]
- The TVar in concurrent-ruby, a concurrency library for Ruby^[27]
- Verse^[28]

References

^ Harris, Tim; Larus, James; Rajwar, Ravi (2010-06-02). "Transactional Memory, 2nd edition". Synthesis Lectures on Computer Architecture. 5 (1): 1–263. doi:10.2200/S00272ED1V01Y201006CAC011. ISSN 1935-3235.
^ ^a ^b ^c ^d "Transactional Memory: History and Development". Kukuruku Hub. Retrieved 2016-11-16.
^ ^a ^b ^c Solihin, Yan (2016). Fundamentals of Parallel Multicore Architecture. Berkeley, California: Chapman & Hall. pp. 287–292. ISBN 978-1-4822-1118-4.
^ ^a ^b Herlihy, Maurice; Moss, J. Eliot B. (1993). "Transactional memory: Architectural support for lock-free data structures" (PDF). Proceedings of the 20th International Symposium on Computer Architecture (ISCA). pp. 289–300.
^ Stone, J.M.; Stone, H.S.; Heidelberger, P.; Turek, J. (1993). "Multiple Reservations and the Oklahoma Update". IEEE Parallel & Distributed Technology: Systems & Applications. 1 (4): 58–71. doi:10.1109/88.260295. S2CID 11017196.
^ Hammond, L; Wong, V.; Chen, M.; Carlstrom, B.D.; Davis, J.D.; Hertzberg, B.; Prabhu, M.K.; Honggo Wijaya; Kozyrakis, C.; Olukotun, K. (2004). "Transactional memory coherence and consistency". Proceedings of the 31st annual International Symposium on Computer Architecture (ISCA). pp. 102–13. doi:10.1109/ISCA.2004.1310767.
^ Ananian, C.S.; Asanovic, K.; Kuszmaul, B.C.; Leiserson, C.E.; Lie, S. (2005). "Unbounded transactional memory". 11th International Symposium on High-Performance Computer Architecture. pp. 316–327. doi:10.1109/HPCA.2005.41. ISBN 0-7695-2275-0.
^ "LogTM: Log-based transactional memory" (PDF). WISC.
^ "The ATOMOΣ Transactional Programming Language" (PDF). Stanford. Archived from the original (PDF) on 2008-05-21. Retrieved 2009-06-15.
^ Odaira, R.; Castanos, J. G.; Nakaike, T. (2013). "Do C and Java programs scale differently on Hardware Transactional Memory?". 2013 IEEE International Symposium on Workload Characterization (IISWC). p. 34. doi:10.1109/IISWC.2013.6704668. ISBN 978-1-4799-0555-3.
^ ^a ^b ^c ^d ^e David Kanter (2012-08-21). "Analysis of Haswell's Transactional Memory". Real World Technologies. Retrieved 2013-11-19.
^ "Arm releases SVE2 and TME for A-profile architecture - Processors blog - Processors - Arm Community". community.arm.com. 18 April 2019. Retrieved 2019-05-25.
^ "Transactional Memory Extension (TME) intrinsics". Retrieved 2020-05-05.
^ "IBM plants transactional memory in CPU". EE Times.
^ Brian Hall; Ryan Arnold; Peter Bergner; Wainer dos Santos Moschetta; Robert Enenkel; Pat Haugen; Michael R. Meissner; Alex Mericas; Philipp Oehler; Berni Schiefer; Brian F. Veale; Suresh Warrier; Daniel Zabawa; Adhemerval Zanella (2014). Performance Optimization and Tuning Techniques for IBM Processors, including IBM POWER8 (PDF). IBM Redbooks. pp. 37–40. ISBN 978-0-7384-3972-3.
^ Wei Li, IBM XL compiler hardware transactional memory built-in functions for IBM AIX on IBM POWER8 processor-based systems
^ "Power ISA Version 3.1". openpowerfoundation.org. 2020-05-01. Retrieved 2020-10-10.
^ Java on a 1000 Cores – Tales of Hardware/Software CoDesign on YouTube
^ "Control.Monad.STM". hackage.haskell.org. Retrieved 2020-02-06.
^ "STMX Homepage".
^ Wong, Michael. "Transactional Language Constructs for C++" (PDF). Retrieved 12 Jan 2011.
^ "Brief Transactional Memory GCC tutorial".
^ "C Dialect Options - Using the GNU Compiler Collection (GCC)".
^ "TransactionalMemory - GCC Wiki".
^ Rigo, Armin. "Using All These Cores: Transactional Memory in PyPy". europython.eu. Retrieved 7 April 2015.
^ "picotm - Portable Integrated Customizable and Open Transaction Manager".
^ "Concurrent::TVar".
^ Pizlo, Phil (2024-03-15). "Bringing Verse Transactional Memory Semantics to C++". Retrieved 2024-08-18.

External links

Michael Neuling (IBM), "What's the deal with Hardware Transactional Memory!?!" introductory talk at linux.conf.au 2014
Transactional Memory Online: Categorized bibliography about transactional memory

[1] Harris, Tim; Larus, James; Rajwar, Ravi (2010-06-02). "Transactional Memory, 2nd edition". Synthesis Lectures on Computer Architecture. 5 (1): 1–263. doi:10.2200/S00272ED1V01Y201006CAC011. ISSN 1935-3235.

[:0-2] "Transactional Memory: History and Development". Kukuruku Hub. Retrieved 2016-11-16.

[:1-3] Solihin, Yan (2016). Fundamentals of Parallel Multicore Architecture. Berkeley, California: Chapman & Hall. pp. 287–292. ISBN 978-1-4822-1118-4.

[:2-4] Herlihy, Maurice; Moss, J. Eliot B. (1993). "Transactional memory: Architectural support for lock-free data structures" (PDF). Proceedings of the 20th International Symposium on Computer Architecture (ISCA). pp. 289–300.

[5] Stone, J.M.; Stone, H.S.; Heidelberger, P.; Turek, J. (1993). "Multiple Reservations and the Oklahoma Update". IEEE Parallel & Distributed Technology: Systems & Applications. 1 (4): 58–71. doi:10.1109/88.260295. S2CID 11017196.

[6] Hammond, L; Wong, V.; Chen, M.; Carlstrom, B.D.; Davis, J.D.; Hertzberg, B.; Prabhu, M.K.; Honggo Wijaya; Kozyrakis, C.; Olukotun, K. (2004). "Transactional memory coherence and consistency". Proceedings of the 31st annual International Symposium on Computer Architecture (ISCA). pp. 102–13. doi:10.1109/ISCA.2004.1310767.

[7] Ananian, C.S.; Asanovic, K.; Kuszmaul, B.C.; Leiserson, C.E.; Lie, S. (2005). "Unbounded transactional memory". 11th International Symposium on High-Performance Computer Architecture. pp. 316–327. doi:10.1109/HPCA.2005.41. ISBN 0-7695-2275-0.

[8] "LogTM: Log-based transactional memory" (PDF). WISC.

[9] "The ATOMOΣ Transactional Programming Language" (PDF). Stanford. Archived from the original (PDF) on 2008-05-21. Retrieved 2009-06-15.

[10] Odaira, R.; Castanos, J. G.; Nakaike, T. (2013). "Do C and Java programs scale differently on Hardware Transactional Memory?". 2013 IEEE International Symposium on Workload Characterization (IISWC). p. 34. doi:10.1109/IISWC.2013.6704668. ISBN 978-1-4799-0555-3.

[haswell-tm-11] David Kanter (2012-08-21). "Analysis of Haswell's Transactional Memory". Real World Technologies. Retrieved 2013-11-19.

[sve2-tme-12] "Arm releases SVE2 and TME for A-profile architecture - Processors blog - Processors - Arm Community". community.arm.com. 18 April 2019. Retrieved 2019-05-25.

[13] "Transactional Memory Extension (TME) intrinsics". Retrieved 2020-05-05.

[14] "IBM plants transactional memory in CPU". EE Times.

[HallArnold2014-15] Brian Hall; Ryan Arnold; Peter Bergner; Wainer dos Santos Moschetta; Robert Enenkel; Pat Haugen; Michael R. Meissner; Alex Mericas; Philipp Oehler; Berni Schiefer; Brian F. Veale; Suresh Warrier; Daniel Zabawa; Adhemerval Zanella (2014). Performance Optimization and Tuning Techniques for IBM Processors, including IBM POWER8 (PDF). IBM Redbooks. pp. 37–40. ISBN 978-0-7384-3972-3.

[16] Wei Li, IBM XL compiler hardware transactional memory built-in functions for IBM AIX on IBM POWER8 processor-based systems

[isa31-17] "Power ISA Version 3.1". openpowerfoundation.org. 2020-05-01. Retrieved 2020-10-10.

[18] Java on a 1000 Cores – Tales of Hardware/Software CoDesign on YouTube

[19] "Control.Monad.STM". hackage.haskell.org. Retrieved 2020-02-06.

[20] "STMX Homepage".

[21] Wong, Michael. "Transactional Language Constructs for C++" (PDF). Retrieved 12 Jan 2011.

[22] "Brief Transactional Memory GCC tutorial".

[23] "C Dialect Options - Using the GNU Compiler Collection (GCC)".

[24] "TransactionalMemory - GCC Wiki".

[25] Rigo, Armin. "Using All These Cores: Transactional Memory in PyPy". europython.eu. Retrieved 7 April 2015.

[26] "picotm - Portable Integrated Customizable and Open Transaction Manager".

[27] "Concurrent::TVar".

[28] Pizlo, Phil (2024-03-15). "Bringing Verse Transactional Memory Semantics to C++". Retrieved 2024-08-18.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]