Convert Linux ARM Assembly Code for iOS (Update 2)
While getting OCaml 4.00.0 working on iOS, I’ve learned quite a bit about the differences between the Linux ARM assembler and the iOS ARM assembler. The two are related: the iOS assembler is like a cousin of the Linux assembler whose ancestors left the home country a few generations back. However, the differences between the two were one of the main problems I had to solve.
This information was hard won: documentation for the iOS assembler seems
rare to nonexistent (email me if you know where to find it). So I spent
many hours reading the source code of as(1)
at Apple’s open source
site. To help others avoid duplicating the effort, I’m publishing here
the latest version of my Python script arm-as-to-ios
that converts ARM
assembly code from the Linux format to the required iOS format. This
script contains all the wisdom I’ve accumulated so far (including some
that’s not required for the OCaml-on-iOS project).
Note: I wrote a new, improved version of this script, described in Convert Linux ARM Assembly Code for iOS (Update 3).
I wrote the script specifically to convert the hand-written ARM assembly files of the OCaml runtime to work on iOS. The advantages of using a script are that the conversion is done consistently, and the script might still be useful if the assembly code is rewritten in the future.
Another advantage is that the script might help other people who need to port ARM assembly code from Linux to iOS. Granted, there probably aren’t a lot of people doing this. But if you are, maybe the script will be a useful starting point or at least a careful list of differences between the assemblers.
The output of the script is upward compatible. By this I mean that a converted assembly file is supposed to work in its originally supported Linux environments as well as in iOS. In theory this means that the converted assembly files could be sent “upstream” to maintainers (if they’re comfortable with supporting iOS and/or the extra complexity of the code).
The current version of arm-as-to-ios
is 1.3.0. In its current form,
it does the following conversions:
Declare the ARM architecture for iOS with a
.machine
pseudo-op. The possibilities arearmv6
andarmv7
.Specify that
armv7
code should use the more space-efficient Thumb encoding in iOS.Generate declarations for functions, similar to the
.type
pseudo-op for Linux assemblers. It appears that Apple requires.thumb_func
declarations only for Thumb functions. Other symbols are apparently assumed to be ARM functions. To make this work upward-compatibly, I define an assembly macro named.funtype
that is expanded properly in the different environments.Make sure that global symbols have the right format. For Linux, they look like “
abc
”. For iOS, they look like “_abc
” (with a leading underscore). To support upward compatibility, this is handled by a cpp macro namedGlo()
that generates the proper symbol form for each environment.Make sure that assembler-local symbols have the right format. For Linux assemblers, they look like “
.Lxxx
”. For the iOS assembler, they look like “Lxxx
”. To support upward compatibly, this is handled by a cpp macro namedLoc()
that generates the proper symbol form for each environment.Replace uses of
=
value notation by explicit loads from memory. The Linux ARM assemblers interpretldr rM, =
value to mean that the value should be loaded into register M immediately (usingmov
) if possible, and loaded from memory (usingldr
with PC-relative addressing) otherwise. The Apple assembler seems not to support this, soarm-as-to-ios
replaces uses of=
value with explicit memory loads, emitting the pool of values into the.text
segment at the end of the file.Convert jump tables (for the
tbh
instruction) to a form that works with the iOS assembler. The form commonly used in Linux generates a bad jump table under iOS—apparently the iOS assembler interprets the expressions differently.Remove uses of two pseudo-ops,
.type
and.size
, when assembling for iOS. They aren’t supported by Apple’s assembler. This is done by defining null assembly macros for them.Define a macro
cbz
when using ARM encodings for iOS. Thecbz
instruction is Thumb-only. The definition replaces it with a pair of ARM instructions.
You can download the script here:
The full text of the script is also included at the end of this post.
This is the third version of arm-as-to-ios
, and there may well be a
few more changes when I work on the armv6
architecture. As I make
changes, I’ll keep the linked script up to date. If there are
significant changes I’ll make another post about them.
If you want to try out arm-as-to-ios
, copy and paste the lines from
the end of this post into a file named arm-as-to-ios
, or download it
from the above link. Mark it as a script with chmod
:
$ chmod +x arm-as-to-ios
To use the script, specify the name of an ARM assembly file. If no files are given, the script processes its standard input.
The following small example demonstrates the translations that
arm-as-to-ios
performs. Here is a small file of Linux ARM assembly
code:
.syntax unified
.text
.align 2
.globl example
.type example, %function
example:
sub r10, r10, 8
cmp r10, r11
bcc 1f
bx lr
1:
ldr r7, =last_return_address
str lr, [r7]
bl .Lcall_gc
ldr lr, [r7]
b example
.Lcall_gc:
ldr r12, =bottom_of_stack
str sp, [r12]
bl garbage_collection
bx lr
.Ljump:
# A jump table (code fragment).
tbh [pc, r4, lsl #1]
.short (.Ltest1-.)/2+0
.short (.Ltest2-.)/2+1
.short (.Ltest3-.)/2+2
.Ltest1:
ldr r4, [r4]
.Ltest2:
str r6, [sp, 8]
.Ltest3:
str r8, [r12]
If you run arm-as-to-ios
on this file, you get the following output
that works for both Linux and iOS assemblers:
.syntax unified
/* Apple compatibility macros */
#if defined(SYS_macosx)
#define Glo(s) _##s
#define Loc(s) L##s
#if defined(MODEL_armv6)
.machine armv6
.macro .funtype
.endm
.macro cbz
cmp $0, #0
beq $1
.endm
#else
.machine armv7
.thumb
.macro .funtype
.thumb_func $0
.endm
#endif
.macro .type
.endm
.macro .size
.endm
#else
#define Glo(s) s
#define Loc(s) .L##s
.macro .funtype symbol
.type \symbol, %function
.endm
#endif
/* End Apple compatibility macros */
.text
.align 2
.globl Glo(example)
.funtype Glo(example)
Glo(example):
sub r10, r10, 8
cmp r10, r11
bcc 1f
bx lr
1:
ldr r7, Loc(Plast_return_address)
str lr, [r7]
bl Loc(call_gc)
ldr lr, [r7]
b Glo(example)
Loc(call_gc):
ldr r12, Loc(Pbottom_of_stack)
str sp, [r12]
bl Glo(garbage_collection)
bx lr
Loc(jump):
# A jump table (code fragment).
tbh [pc, r4, lsl #1]
Loc(B27):
.short (Loc(test1)-Loc(B27))/2
.short (Loc(test2)-Loc(B27))/2
.short (Loc(test3)-Loc(B27))/2
Loc(test1):
ldr r4, [r4]
Loc(test2):
str r6, [sp, 8]
Loc(test3):
str r8, [r12]
/* Pool of addresses loaded into registers */
.text
.align 2
Loc(Plast_return_address):
.long Glo(last_return_address)
Loc(Pbottom_of_stack):
.long Glo(bottom_of_stack)
The output consists of a fixed prefix followed by the translation of the input file, followed by the pool of values to be loaded into registers.
The following shows a successful assembly of the arm.S
file from OCaml
4.00.0:
$ PLT=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
$ PLTBIN=$PLT/Developer/usr/bin
$ arm-as-to-ios asmrun/arm.S > armios.S
$ $PLTBIN/gcc -c -arch armv7 -DSYS_macosx -DMODEL_armv7 -o armios.o armios.S
$ file armios.o
armios.o: Mach-O object arm
$ otool -tv armios.o | head
armios.o:
(__TEXT,__text) section
caml_call_gc:
00000000 f8dfc1e0 ldr.w ip, [pc, #480] @ 0x1e4
00000004 f8cce000 str.w lr, [ip]
00000008 f8dfc1dc ldr.w ip, [pc, #476] @ 0x1e8
0000000c f8ccd000 str.w sp, [ip]
00000010 ed2d0b10 vstmdb sp!, {d0-d7}
00000014 e92d50ff stmdb sp!, {r0, r1, r2, r3, r4, r5, r6, r7, ip, lr}
00000018 f8dfc1d0 ldr.w ip, [pc, #464] @ 0x1ec
If you have any corrections, improvements, or other comments, leave them below or email me at jeffsco@psellos.com. I’d be very pleased to hear if the script has been helpful to anyone.
Posted by: Jeffrey
Appendix
Here is the current text of the script:
#!/usr/bin/env python
#
# arm-as-to-ios Modify ARM assembly code for the iOS assembler
#
# Copyright (c) 2012 Psellos http://psellos.com/
# Licensed under the MIT License:
# http://www.opensource.org/licenses/mit-license.php
#
# Resources for running OCaml on iOS: http://psellos.com/ocaml/
#
import sys
import re
VERSION = '1.3.0'
# Prefixes for pooled symbol labels and jump table base labels. They're
# in the space of Linux assembler local symbols. Later rules will
# modify them to the Loc() form.
#
g_poolpfx = '.LP'
g_basepfx = '.LB'
def add_prefix(instrs):
# Add compatibility macros for all systems, plus hardware
# definitions and compatibility macros for iOS.
#
# All systems:
#
# Glo() cpp macro for making global symbols (xxx vs _xxx)
# Loc() cpp macro for making local symbols (.Lxxx vs Lxxx)
# .funtype Expands to .thumb_func for iOS armv7 (null for armv6)
# Expands to .type %function for others
#
# iOS:
#
# .machine armv6/armv7
# .thumb (for armv7)
# cbz Expands to cmp/beq for armv6 (Thumb-only instr)
# .type Not supported by Apple assembler
# .size Not supported by Apple assembler
#
defre = '#[ \t]*if.*def.*SYS' # Add new defs near first existing ones
skipre = '$|\.syntax[ \t]' # Skip comment lines (and .syntax)
for i in range(len(instrs)):
if re.match(defre, instrs[i][1]):
break
else:
i = 0
for i in range(i, len(instrs)):
if not re.match(skipre, instrs[i][1]):
break
instrs[i:0] = [
('', '', '\n'),
('/* Apple compatibility macros */', '', '\n'),
('', '#if defined(SYS_macosx)', '\n'),
('', '#define Glo(s) _##s', '\n'),
('', '#define Loc(s) L##s', '\n'),
('', '#if defined(MODEL_armv6)', '\n'),
(' ', '.machine armv6', '\n'),
(' ', '.macro .funtype', '\n'),
(' ', '.endm', '\n'),
(' ', '.macro cbz', '\n'),
(' ', 'cmp $0, #0', '\n'),
(' ', 'beq $1', '\n'),
(' ', '.endm', '\n'),
('', '#else', '\n'),
(' ', '.machine armv7', '\n'),
(' ', '.thumb', '\n'),
(' ', '.macro .funtype', '\n'),
(' ', '.thumb_func $0', '\n'),
(' ', '.endm', '\n'),
('', '#endif', '\n'),
(' ', '.macro .type', '\n'),
(' ', '.endm', '\n'),
(' ', '.macro .size', '\n'),
(' ', '.endm', '\n'),
('', '#else', '\n'),
('', '#define Glo(s) s', '\n'),
('', '#define Loc(s) .L##s', '\n'),
(' ', '.macro .funtype symbol', '\n'),
(' ', '.type \\symbol, %function', '\n'),
(' ', '.endm', '\n'),
('', '#endif', '\n'),
('/* End Apple compatibility macros */', '', '\n'),
('', '', '\n')
]
return instrs
# Regular expression for modified ldr lines
#
g_ldre = '(ldr[ \t][^,]*,[ \t]*)=(([^ \t\n@,/]|/(?!\*))*)(.*)'
def explicit_address_loads(instrs):
# Linux assemblers allow the following:
#
# ldr rM, =symbol
#
# which loads rM with [mov] (immediately) if possible, or creates an
# entry in memory for the symbol value and loads it PC-relatively
# with [ldr].
#
# The Apple assembler doesn't seem to support this notation. If the
# value is a suitable constant, it emits a valid [mov]. Otherwise
# it seems to emit an invalid [ldr] that always generates an error.
# (At least I have not been able to make it work). So, change uses
# of =symbol to explicit PC-relative loads.
#
# This requires a pool containing the addresses to be loaded. For
# now, we just keep track of it ourselves and emit it into the text
# segment at the end of the file.
#
syms = {}
result = []
def repl1((syms, result), (a, b, c)):
global g_poolpfx
global g_ldre
(b1, b2, b3) = parse_iparts(b)
mo = re.match(g_ldre, b3, re.DOTALL)
if mo:
if mo.group(2) not in syms:
syms[mo.group(2)] = len(syms)
psym = mo.group(2)
if psym[0:2] == '.L':
psym = psym[2:]
newb3 = mo.group(1) + g_poolpfx + psym + mo.group(4)
result.append((a, b1 + b2 + newb3, c))
else:
result.append((a, b, c))
return (syms, result)
def pool1(result, s):
global g_poolpfx
psym = s
if psym[0:2] == '.L':
psym = psym[2:]
result.append(('', g_poolpfx + psym + ':', '\n'))
result.append((' ', '.long ' + s, '\n'))
return result
reduce(repl1, instrs, (syms, result))
if len(syms) > 0:
result.append(('', '', '\n'))
result.append(('/* Pool of addresses loaded into registers */',
'', '\n'))
result.append(('', '', '\n'))
result.append((' ', '.text', '\n'))
result.append((' ', '.align 2', '\n'))
reduce(pool1, sorted(syms, key=syms.get), result)
return result
def global_symbols(instrs):
# The form of a global symbol differs between Linux assemblers and
# the Apple assember:
#
# Linux: xxx
# Apple: _xxx
#
# Change occurrences of global symbols to use the Glo() cpp macro
# defined in our prefix.
#
# We consider a symbol to be global if:
#
# a. It appears in a .globl declaration; or
# b. It appears, doesn't have local form, and is not defined
#
endsy = '($|[^a-zA-Z0-9])' # End of a symbol
allsyms = set()
defsyms = set()
glosyms = set()
result = []
def findglob1 (glosyms, (a, b, c)):
mo = re.match('\.globl[ \t]+([^ \t@:,+\-*/()]+)', b)
if mo:
glosyms.add(mo.group(1))
return glosyms
def findany1 ((allsyms, skipct), (a, b, c)):
def looksglobal(s):
if re.match('(r|a|v|p|c|cr|f|s|d|q|mvax|wcgr)[0-9]+$', s, re.I):
return False # numbered registers
if re.match('(wr|sb|sl|fp|ip|sp|lr|pc)$', s, re.I):
return False # named registers
if re.match('(fpsid|fpscr|fpexc|mvfr1|mvfr0)$', s, re.I):
return False # more named registers
if re.match('(mvf|mvd|mvfx|mvdx|dspsc)$', s, re.I):
return False # even more named registers
if re.match('(wcid|wcon|wcssf|wcasf|acc)$', s, re.I):
return False # even more named registers
if re.match('\.L|[0-9]|#', s):
return False # local symbol or number
if re.match('(asl|lsl|lsr|asr|ror|rrx)$', s, re.I):
return False # shift names
return True
if re.match('#', b):
return (allsyms, skipct)
# Track nesting of .macro/.endm. For now, we don't look for
# global syms in macro defs. (Avoiding scoping probs etc.)
#
if skipct > 0 and re.match('\.(endm|endmacro)' + endsy, b):
return (allsyms, skipct - 1)
if re.match('\.macro' + endsy, b):
return (allsyms, skipct + 1)
if skipct > 0:
return (allsyms, skipct)
if re.match('\.(type|size|syntax|arch|fpu)' + endsy, b):
return (allsyms, skipct)
lb = b
lb = re.sub('@.*', '', lb)
lb = re.sub('/\*.*?\*/', '', lb)
mo = re.match('[^:]*:[ \t]*(.*)', lb)
if mo:
lb = mo.group(1)
lb = re.match('[^ \t]*[ \t]*(.*)', lb).group(1)
if re.match('\.req', lb):
return (allsyms, skipct)
for s in re.findall('[^ \t@:,+\-*/[\]{}()]+', lb):
if looksglobal(s):
allsyms.add(s)
return (allsyms, skipct)
def finddef1(defsyms, (a, b, c)):
mo = re.match('([^ \t]+)[ \t]+\.req' + endsy, b) \
or re.match('([^ \t:]+)[ \t]*:', b)
if mo:
defsyms.add(mo.group(1))
return defsyms
def repl1((glosyms, result), (a, b, c)):
if re.match('#', b):
# Preprocessor line
result.append((a, b, c))
else:
matches = list(re.finditer('[^ \t@:,+*/-]+', b))
if matches != []:
matches.reverse()
newb = b
for mo in matches:
if mo.group() in glosyms:
newb = newb[0:mo.start()] + \
'Glo(' + mo.group() + ')' + \
newb[mo.end():]
result.append((a, newb, c))
else:
result.append((a, b, c))
return (glosyms, result)
reduce(findglob1, instrs, glosyms)
reduce(findany1, instrs, (allsyms, 0))
reduce(finddef1, instrs, defsyms)
glosyms |= (allsyms - defsyms)
reduce(repl1, instrs, (glosyms, result))
return result
def local_symbols(instrs):
# The form of a local symbol differs between Linux assemblers and
# the Apple assember:
#
# Linux: .Lxxx
# Apple: Lxxx
#
# Change occurrences of local symbols to use the Loc() cpp macro
# defined in our prefix.
#
lsyms = set()
result = []
def find1 (lsyms, (a, b, c)):
mo = re.match('(\.L[^ \t:]*)[ \t]*:', b)
if mo:
lsyms.add(mo.group(1))
return lsyms
def repl1((lsyms, result), (a, b, c)):
matches = list(re.finditer('\.L[^ \t@:,+*/\-()]+', b))
if matches != []:
matches.reverse()
newb = b
for mo in matches:
if mo.group() in lsyms:
newb = newb[0:mo.start()] + \
'Loc(' + mo.group()[2:] + ')' + \
newb[mo.end():]
result.append((a, newb, c))
else:
result.append((a, b, c))
return (lsyms, result)
reduce(find1, instrs, lsyms)
reduce(repl1, instrs, (lsyms, result))
return result
def funtypes(instrs):
# Linux assemblers accept declarations like this:
#
# .type symbol, %function
#
# For Thumb functions, the Apple assembler wants to see:
#
# .thumb_func symbol
#
# Handle this by converting declarations to this:
#
# .funtype symbol
#
# Our prefix defines an appropriate .funtype macro for each
# environment.
#
result = []
def repl1(result, (a, b, c)):
mo = re.match('.type[ \t]+([^ \t,]*),[ \t]*%function', b)
if mo:
result.append((a, '.funtype ' + mo.group(1), c))
else:
result.append((a, b, c))
return result
reduce(repl1, instrs, result)
return result
def jump_tables(instrs):
# Jump tables for Linux assemblers often look like this:
#
# tbh [pc, rM, lsl #1]
# .short (.Labc-.)/2+0
# .short (.Ldef-.)/2+1
# .short (.Lghi-.)/2+2
#
# The Apple assembler disagrees about the meaning of this code,
# producing jump tables that don't work. Convert to the following:
#
# tbh [pc, rM, lsl #1]
# .LBxxx:
# .short (.Labc-.LBxxx)/2
# .short (.Ldef-.LBxxx)/2
# .short (.Lghi-.LBxxx)/2
#
# In fact we just convert sequences of .short pseudo-ops of the
# right form. There's no requirement that they follow a tbh
# instruction.
#
baselabs = []
result = []
def short_match(seq, op):
# Determine whether the op is a .short of the form that needs to
# be converted: .short (symbol-.)/2+k. If so, return a pair
# containing the symbol and the value of k. If not, return
# None. The short can only be converted if there were at least
# k other .shorts in sequence before the current one. A summary
# of the previous .shorts is in seq.
#
# (A real scanner and parser would do a better job, but this was
# quick to get working.)
#
sp = '([ \t]|/\*.*?\*/)*' # space
sp1 = '([ \t]|/\*.*?\*/)+' # at least 1 space
spe = '([ \t]|/\*.*?\*/|@[^\n]*)*$' # end-of-instr space
expr_re0 = (
'\.short' + sp + '\(' + sp + # .short (
'([^ \t+\-*/@()]+)' + sp + # symbol
'-' + sp + '\.' + sp + '\)' + sp + # -.)
'/' + sp + '2' + spe # /2 END
)
expr_re1 = (
'\.short' + sp + '\(' + sp + # .short (
'([^ \t+\-*/@()]+)' + sp + # symbol
'-' + sp + '\.' + sp + '\)' + sp + # -.)
'/' + sp + '2' + sp + # /2
'\+' + sp + # +
'((0[xX])?[0-9]+)' + spe # k END
)
expr_re2 = (
'\.short' + sp1 + # .short
'((0[xX])?[0-9]+)' + sp + # k
'\+' + sp + '\(' + sp + # +(
'([^ \t+\-*/@()]+)' + sp + # symbol
'-' + sp + '\.' + sp + '\)' + sp + # -.)
'/' + sp + '2' + spe # /2 END
)
mo = re.match(expr_re0, op)
if mo:
return(mo.group(3), 0)
mo = re.match(expr_re1, op)
if mo:
k = int(mo.group(11), 0)
if k > len(seq):
return None
return (mo.group(3), k)
mo = re.match(expr_re2, op)
if mo:
k = int(mo.group(2), 0)
if k > len(seq):
return None
return (mo.group(7), k)
return None
def conv1 ((baselabs, shortseq, label, result), (a, b, c)):
# Convert current instr (a,b,c) if it's a .short of the right
# form that spans a previous sequence of .shorts.
#
(b1, b2, b3) = parse_iparts(b)
if b3 == '':
# No operation: just note label if present.
result.append((a, b, c))
if re.match('\.L.', b1):
return (baselabs, shortseq, b1, result)
return (baselabs, shortseq, label, result)
if not re.match('.short[ \t]+[^ \t@]', b3):
# Not a .short: clear shortseq and label
result.append((a, b, c))
return (baselabs, [], '', result)
# We have a .short: figure out the label if any
if re.match('\.L', b1):
sl = b1
else:
sl = label
mpair = short_match(shortseq, b3)
if not mpair:
# A .short, but not of right form
shortseq.append((len(result), sl))
result.append((a, b, c))
return (baselabs, shortseq, '', result)
# OK, we have a .short to convert!
(sym, k) = mpair
shortseq.append((len(result), sl))
# Figure out base label (create one if necessary).
bx = len(shortseq) - 1 - k
bl = shortseq[bx][1]
if bl == '':
bl = g_basepfx + str(shortseq[bx][0])
shortseq[bx] = (shortseq[bx][0], bl)
baselabs.append(shortseq[bx])
op = '.short\t(' + sym + '-' + bl + ')/2'
result.append ((a, b1 + b2 + op, c))
return (baselabs, shortseq, '', result)
# Convert, accumulate result and new labels.
reduce(conv1, instrs, (baselabs, [], '', result))
# Add labels created here to the instruction stream.
baselabs.reverse()
for (ix, lab) in baselabs:
result[ix:0] = [('', lab + ':', '\n')]
# That does it
return result
def read_input():
# Concatenate all the input files into a string.
#
def fnl(s):
if s == '' or s[-1] == '\n':
return s
else:
return s + '\n'
if len(sys.argv) < 2:
return fnl(sys.stdin.read())
else:
input = ""
for f in sys.argv[1:]:
try:
fd = open(f)
input = input + fnl(fd.read())
fd.close()
except:
sys.stderr.write('arm-as-to-ios: cannot open ' + f + '\n')
return input
def parse_instrs(s):
# Parse the string into assembly instructions, also noting C
# preprocessor lines. Each instruction is represented as a triple:
# (space/comments, instruction, end). The end is either ';' or
# '\n'. Instructions might have embedded comments, but they
# probably won't get fixed up if they do. (I've never seen it in
# real code.)
#
def goodmo(mo):
if mo == None:
# Should never happen
sys.stderr.write('arm-as-to-ios: internal parsing error\n')
sys.exit(1)
cpp_re = '([ \t]*)(#([^\n]*\\\\\n)*[^\n]*[^\\\\\n])\n'
comment_re = '[ \t]*#[^\n]*'
instr_re = (
'(([ \t]|/\*.*?\*/|@[^\n]*)*)' # Spaces & comments
'(([ \t]|/\*.*?\*/|[^;\n])*)' # "Instruction"
'([;\n])' # End
)
instrs = []
while s != '':
if re.match('[ \t]*#[ \t]*(if|ifdef|elif|else|endif|define)', s):
mo = re.match(cpp_re, s)
goodmo(mo)
instrs.append((mo.group(1), mo.group(2), '\n'))
elif re.match('[ \t]*#', s):
mo = re.match(comment_re, s)
goodmo(mo)
instrs.append((mo.group(0), '', '\n'))
else:
mo = re.match(instr_re, s, re.DOTALL)
goodmo(mo)
instrs.append((mo.group(1), mo.group(3), mo.group(5)))
s = s[len(mo.group(0)):]
return instrs
def parse_iparts(i):
# Parse an instruction into smaller parts, returning a triple of
# strings (label, colon, operation). The colon part also contains
# any surrounding spaces and comments (making the label and the
# operation cleaner to process).
#
# (Caller warrants that the given string doesn't start with space or
# a comment. This is true for strings returned by the instruction
# parser.)
#
lab_re = (
'([^ \t:/@]+)' # Label
'(([ \t]|/\*.*?\*/|@[^\n]*)*)' # Spaces & comments
':' # Colon
'(([ \t]|/\*.*?\*/|@[^\n]*)*)' # Spaces & comments
'([^\n]*)' # Operation
)
if len(i) > 0 and i[0] == '#':
# C preprocessor line; treat as operation.
return ('', '', i)
mo = re.match(lab_re, i)
if mo:
return (mo.group(1), mo.group(2) + ':' + mo.group(4), mo.group(6))
# No label, just an operation
return ('', '', i)
def debug_parse(a, b, c):
# Show results of instuction stream parse.
#
(b1, b2, b3) = parse_iparts(b)
newb = '{' + b1 + '}' + '{' + b2 + '}' + '{' + b3 + '}'
sys.stdout.write('{' + a + '}' + newb + c)
def main():
instrs = parse_instrs(read_input())
instrs = explicit_address_loads(instrs)
instrs = funtypes(instrs)
instrs = jump_tables(instrs)
instrs = global_symbols(instrs)
instrs = local_symbols(instrs)
instrs = add_prefix(instrs)
for (a, b, c) in instrs:
sys.stdout.write(a + b + c)
main()