Convert ARM Assembly Code for Apple’s iOS Assembler
Recently I’ve been working on getting OCaml 4.00.0 working on iOS. As I write this, 4.00.0 is the newest version of OCaml, not yet released but available as a beta. I’m treating it as a new project, not trying to re-use any of the patches we’ve been using for OCaml 3.10.2.
The first interesting problem I hit is that Apple’s ARM assembler for
iOS (called “as
”, the traditional Unix name) is quite different from
other ARM assemblers. Although it derives ultimately from the same GNU
codebase, it appears the Apple assembler split off many years ago and
has followed a separate evolutionary path.
This needs to be solved for an OCaml-on-iOS port, because part of the
OCaml runtime is written in assembly code—a file named arm.S
. For
the work on OCaml 3.10.2, we rewrote arm.S
extensively by hand. This
time, I decided to write a Python script to convert ARM assembly code
from the current GNU format to the format used by Apple’s iOS assembler.
This keeps the changes consistent, and it ought to help when arm.S
is
rewritten in the future.
Note: I wrote a new, improved version of this script, described in Convert Linux ARM Assembly Code for iOS (Update 3).
So now I have a script named arm-as-to-ios
that works well enough to
convert arm.S
to a form that can be assembled for iOS. It’s nothing
fancy; currently it just makes the following changes:
Replace uses of
=
value notation by explicit loads from memory. The usual ARM assemblers interpretldr rM, =
value to mean that the value should be loaded into register M immediately (usingmov
) if possible, and loaded from memory (usingldr
with PC-relative addressing) otherwise. The Apple assembler seems not to support this.arm-as-to-ios
replaces uses of=
value with explicit memory loads, emitting the pool of values into the.text
segment at the end of the file.Remove uses of two pseudo-ops,
.type
and.size
. They aren’t supported by Apple’s assembler. This is done by defining null macros for them.Define a macro
cbz
. Thecbz
instruction is Thumb-only. This defininition replaces it with a pair of ARM instructions. Note that the macro parameter syntax of Apple’s assembler is different (possibly just more restrictive) than the usual GNU tools.
Another advantage of using a script is that it might be useful to other people who need to port assembly code to iOS. Granted, there probably aren’t a lot of people doing this. But if you are, maybe the script will provide a useful starting point.
You can download the script here:
I have no doubt that I’ll need to update the script as the project progresses. I’ll keep the linked script up to date. If there are large changes I’ll make another post about them.
Here, also, is the current text of the script:
#!/usr/bin/env python
#
# arm-as-to-ios Modify ARM assembly code for the iOS assembler
#
# Copyright (c) 2012 Psellos http://psellos.com/
# Licensed under the MIT License:
# http://www.opensource.org/licenses/mit-license.php
#
# Resources for running OCaml on iOS: http://psellos.com/ocaml/
#
import sys
import re
VERSION = '1.0.0'
def add_macro_defs(instrs):
# Emit compatibility macros.
#
# cbz: Thumb only; replace with cmp/beq for ARM
# .type: Not supported by Apple assembler
# .size: Not supported by Apple assembler
#
skippable = '$|\.syntax[ \t]'
i = 0
for i in range(len(instrs)):
if not re.match(skippable, instrs[i][1]):
break
instrs[i:0] = [
('', '', '\n'),
('/* Apple compatibility macros */', '', '\n'),
(' ', '.macro cbz', '\n'),
(' ', 'cmp $0, #0', '\n'),
(' ', 'beq $1', '\n'),
(' ', '.endm', '\n'),
(' ', '.macro .type', '\n'),
(' ', '.endm', '\n'),
(' ', '.macro .size', '\n'),
(' ', '.endm', '\n'),
('', '', '\n')
]
return instrs
# Prefix for derived symbols
#
g_prefix = 'PL'
# Regular expression for modified ldr lines
#
g_ldre = '(ldr[ \t][^,]*,[ \t]*)=(([^ \t\n@,/]|/(?!\*))*)(.*)'
def explicit_address_loads(instrs):
# The Gnu assembler allows the following:
#
# ldr rM, =symbol
#
# which loads rM with [mov] (immediately) if possible, or creates an
# entry in memory for the symbol value and loads it PC-relatively
# with [ldr].
#
# The Apple assembler doesn't seem to support this notation. If the
# value is a suitable constant, it emits a valid [mov]. Otherwise
# it seems to emit an invalid [ldr] that always generates an error.
# (At least I have not been able to make it work). So, change uses
# of =symbol to explicit PC-relative loads.
#
# This requires a pool containing the addresses to be loaded. For
# now, we just keep track of it ourselves and emit it into the text
# segment at the end of the file.
syms = {}
result = []
def change1((syms, result), (a, b, c)):
global g_prefix
global g_ldre
mo = re.match(g_ldre, b, re.DOTALL)
if mo:
if mo.group(2) not in syms:
syms[mo.group(2)] = len(syms)
newb = (mo.group(1) + g_prefix + mo.group(2) + mo.group(4))
result.append((a, newb, c))
else:
result.append((a, b, c))
return (syms, result)
def pool1(result, s):
global g_prefix
result.append(('', g_prefix + s + ':', '\n'))
result.append((' ', '.long ' + s, '\n'))
return result
reduce(change1, instrs, (syms, result))
if len(syms) > 0:
result.append(('', '', '\n'))
result.append(('/* Pool of addresses loaded into registers */',
'', '\n'))
result.append(('', '', '\n'))
result.append((' ', '.text', '\n'))
result.append((' ', '.align 2', '\n'))
reduce(pool1, sorted(syms, key=syms.get), result)
return result
def read_input():
# Concatenate all the input files into a string.
#
def fnl(s):
if s == '' or s[-1] == '\n':
return s
else:
return s + '\n'
if len(sys.argv) < 2:
return fnl(sys.stdin.read())
else:
input = ""
for f in sys.argv[1:]:
try:
fd = open(f)
input = input + fnl(fd.read())
fd.close()
except:
sys.stderr.write('arm-as-to-ios: cannot open ' + f + '\n')
return input
def parse_instrs(s):
# Parse the string into assembly instructions while tolerating C
# preprocessor lines. Each instruction is represented as a triple:
# (space/comments, instruction, end). The end is either ';' or
# '\n'. Instructions can have embedded comments, but they won't get
# fixed up if they do. (I've never seen it in real code.)
#
def goodmo(mo):
if mo == None:
# Should never happen
sys.stderr.write('arm-as-to-ios: internal parsing error\n')
sys.exit(1)
cpp_re = '([ \t]*#([^\n]*\\\\\n)*[^\n]*[^\\\\\n])\n'
instr_re = (
'(([ \t]|/\*.*?\*/|@[^\n]*)*)' # Spaces & comments
'(([ \t]|/\*.*?\*/|[^;\n])*)' # "Instruction"
'([;\n])' # End
)
instrs = []
while s != '':
if re.match('[ \t]*#', s):
mo = re.match(cpp_re, s)
goodmo(mo)
instrs.append((mo.group(1), '', '\n'))
else:
mo = re.match(instr_re, s, re.DOTALL)
goodmo(mo)
instrs.append((mo.group(1), mo.group(3), mo.group(5)))
s = s[len(mo.group(0)):]
return instrs
def main():
instrs = parse_instrs(read_input())
instrs = add_macro_defs(instrs)
instrs = explicit_address_loads(instrs)
for (a, b, c) in instrs:
sys.stdout.write(a + b + c)
main()
Copy and paste the lines into a file named arm-as-to-ios
(or download
it from the above link). Mark it as a script with chmod:
$ chmod +x arm-as-to-ios
To use the script, specify the name of an ARM assembly file. If no
files are given, the script processes its standard input. The following
shows a successful assembly of arm.S
:
$ PLT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
$ PLTBIN=$PLT/Developer/usr/bin
$ arm-as-to-ios asmrun/arm.S | cpp > armios.S
$ $PLTBIN/as -arch armv6 -o armios.o armios.S
$ file armios.o
armios.o: Mach-O object arm
$ otool -tv armios.o | head
armios.o:
(__TEXT,__text) section
caml_call_gc:
00000000 e59fc2a0 ldr ip, [pc, #672] @ 0x2a8
00000004 e58ce000 str lr, [ip]
.Lcaml_call_gc:
00000008 e59fc29c ldr ip, [pc, #668] @ 0x2ac
0000000c e58cd000 str sp, [ip]
00000010 ed2d0b10 vstmdb sp!, {d0-d7}
00000014 e92d50ff push {r0, r1, r2, r3, r4, r5, r6, r7, ip, lr}
If you have any corrections, improvements, or other comments, leave them below or email me at jeffsco@psellos.com. I’d be very pleased to hear if the script has been helpful to anyone.
Posted by: Jeffrey