Getting Started with Appium for macOS Desktop App Testing with Claude Code
Automating native Mac applications with Appium and Claude Code
1. What Appium Actually Is (and Why It Matters for Desktop)
Most engineers encounter Appium in a mobile context, but it has always supported native desktop application testing on macOS through its mac2 driver, and this is genuinely useful territory that gets far less attention than it deserves. If you’ve ever needed to automate a workflow in Terminal, verify that a Word document opens and renders correctly, test that your internal macOS tool behaves properly after an update, or drive any native Mac application the way a user would, Appium is the right tool for the job.
Appium is an open-source test automation framework built on the WebDriver protocol, the same wire protocol that powers Selenium and Playwright. It doesn’t inject code into your application or require recompilation. Instead, it talks to the macOS Accessibility API, the same system that powers VoiceOver and keyboard navigation, to find elements on screen, interact with them, and read their state. This means it works with any application that participates in the macOS accessibility model, which includes virtually every well-behaved native app: system apps like Finder, Calculator, Notes, and Terminal, productivity tools like Microsoft Word and Excel, developer tools, and your own internal applications.
The architecture has three parts: the Appium server, which is a Node.js HTTP server that receives WebDriver commands; the mac2 driver, which translates those commands into macOS Accessibility API calls; and your test client, which is a library in the language of your choice that sends HTTP requests to the server. You write tests in Python, the client sends commands to the Appium server, the server tells the mac2 driver what to do, and the driver interacts with the application on screen. Understanding this chain matters because most setup problems occur at one of these handoff points, and knowing which layer is failing saves a lot of diagnostic time.
2. Installing and Configuring Everything in One Go
Rather than walking through each dependency step by step, the cleanest approach is a single script that handles the entire chain: Homebrew, nvm, Node.js, Xcode CLI tools, Appium, the mac2 driver, the Python client, and the diagnostic check, all in the correct order. Save the following as setup-appium.sh in your project root, make it executable, and run it once.
#!/usr/bin/env bash
set -euo pipefail
# Appium macOS Desktop Testing Setup Script
# Automates native macOS applications via the mac2 driver and Accessibility API
# Prerequisites: Xcode installed from the App Store before running this script
echo "==> Checking Xcode installation..."
if ! xcode-select -p &>/dev/null; then
echo "ERROR: Xcode is not installed. Install it from the App Store first."
exit 1
fi
echo "==> Accepting Xcode licence..."
sudo xcodebuild -license accept
echo "==> Installing Xcode CLI tools..."
xcode-select --install 2>/dev/null || echo "CLI tools already installed, continuing."
echo "==> Installing Homebrew (if not present)..."
if ! command -v brew &>/dev/null; then
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
fi
echo "==> Installing nvm..."
if [ ! -d "$HOME/.nvm" ]; then
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
fi
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
echo "==> Installing Node.js LTS via nvm..."
nvm install --lts
nvm use --lts
nvm alias default node
echo "==> Installing Appium server..."
npm install -g appium
echo "==> Installing mac2 driver for macOS desktop app testing..."
appium driver install mac2
echo "==> Verifying installed drivers..."
appium driver list --installed
echo "==> Installing Appium Doctor..."
npm install -g @appium/doctor
echo "==> Installing Python client and test dependencies..."
pip3 install --upgrade pip
pip3 install Appium-Python-Client pytest
echo "==> Installing Appium Inspector (GUI element explorer)..."
brew install --cask appium-inspector
echo "==> Installing Claude Code..."
npm install -g @anthropic/claude-code
echo ""
echo "==> Running Appium Doctor..."
appium-doctor || true
echo ""
echo "======================================================"
echo "Setup complete."
echo ""
echo "Next steps:"
echo " 1. Grant accessibility permissions to Terminal:"
echo " System Settings > Privacy & Security > Accessibility"
echo " Add Terminal (or iTerm2) and enable it."
echo " 2. Start the Appium server: appium"
echo " 3. Open Appium Inspector and connect to http://localhost:4723"
echo " to explore the element tree of any running Mac application."
echo " 4. Start writing tests."
echo "======================================================" Run it with:
chmod +x setup-appium.sh
./setup-appium.sh The script is idempotent in most respects so re-running it after a partial failure won’t break anything that already succeeded. The one step it cannot automate is granting accessibility permissions to your terminal application. macOS requires this to be done manually through System Settings, and without it the mac2 driver cannot interact with any application on screen. Go to System Settings > Privacy & Security > Accessibility, click the plus button, and add Terminal or whichever terminal application you use. This permission persists across reboots and only needs to be set once.
3. Your First Test: Automating macOS Calculator
With the environment configured, start the Appium server in a dedicated terminal tab. Unlike iOS testing, you don’t need the --relaxed-security flag for desktop automation and the server starts cleanly without it:
appium The server listens on http://localhost:4723 and logs every incoming WebDriver command, which makes it very useful for understanding what your tests are actually sending when something goes wrong.
Every Appium session begins with a capabilities object, a JSON dictionary that tells the server which driver to use and which application to launch. For macOS desktop testing the critical fields are platformName, automationName (always mac2 for macOS), and either bundleId for applications already installed on your machine or app pointing to an absolute path for applications you want to launch fresh. To find the bundle ID of any installed application you can run osascript -e 'id of app "Calculator"' or browse /Applications and inspect the Info.plist inside the app bundle.
Here is a complete test that opens Calculator, performs an addition, and asserts the result. It uses the Page Object Model from the start because the pattern pays for itself immediately once you have more than a handful of tests, and it gives Claude Code a clear convention to follow when generating new tests later:
# pages/calculator_page.py
from appium.webdriver.common.appiumby import AppiumBy
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class CalculatorPage:
DISPLAY = (AppiumBy.ACCESSIBILITY_ID, "main display")
def __init__(self, driver):
self.driver = driver
self.wait = WebDriverWait(driver, 5)
def press(self, key: str):
self.wait.until(EC.presence_of_element_located(
(AppiumBy.ACCESSIBILITY_ID, key)
)).click()
return self
def result(self) -> str:
return self.wait.until(
EC.presence_of_element_located(self.DISPLAY)
).get_attribute("value") # conftest.py
import pytest
from appium import webdriver
from appium.options import XCUITestOptions
@pytest.fixture(scope="session")
def driver():
options = XCUITestOptions()
options.platform_name = "mac"
options.automation_name = "mac2"
options.bundle_id = "com.apple.calculator"
d = webdriver.Remote("http://localhost:4723", options=options)
yield d
d.quit() # tests/test_calculator.py
from pages.calculator_page import CalculatorPage
def test_simple_addition(driver):
calc = CalculatorPage(driver)
calc.press("2").press("add").press("3").press("=")
assert calc.result() == "5"
def test_multiplication(driver):
calc = CalculatorPage(driver)
calc.press("4").press("multiply").press("7").press("=")
assert calc.result() == "28" Run with:
pytest tests/test_calculator.py -v 4. Finding Elements in Mac Applications
The hardest part of writing Appium tests for desktop apps is identifying elements reliably. Unlike the web where CSS selectors and visible text are predictable, native macOS applications use accessibility identifiers, labels, and element roles that are not always obvious from looking at the screen. Appium Inspector is how you solve this. It connects to your running Appium server, launches your application with a session, and shows you the full accessibility tree alongside a live screenshot. You can click any element in the tree or on the screenshot and see its attributes: the accessibility identifier, label, value, role, and any other properties the application exposes.
Open Appium Inspector after starting the server, point it at http://localhost:4723, enter your capabilities, and click Start Session. Once the session is live you can explore the tree freely. The attributes you see here are exactly what you use in your test code to locate elements.
Locator strategies should be chosen in order of reliability. Accessibility ID reads the accessibilityIdentifier set by the application developer and is the most stable locator because it doesn’t change when the UI is restyled or rearranged. Predicate string is a flexible macOS native query syntax that lets you match on multiple attributes at once. XPath traverses the entire accessibility tree on every call which makes it slow and brittle as the application changes, so treat it as a last resort:
# Most reliable: stable across UI changes
driver.find_element(AppiumBy.ACCESSIBILITY_ID, "save_button")
# Predicate string: match on label, value, or role
driver.find_element(AppiumBy.IOS_PREDICATE_STRING, "label == 'Save' AND elementType == 9")
# Class chain: structured traversal without full XPath overhead
driver.find_element(AppiumBy.IOS_CLASS_CHAIN, "**/XCUIElementTypeButton[`label == 'Save'`]")
# XPath: use only when nothing else works
driver.find_element(AppiumBy.XPATH, "//XCUIElementTypeButton[@label='Save']") To get the full accessibility tree from a running session and save it for use as context when generating tests with AI, dump the page source programmatically:
page_source = driver.page_source
with open("element_trees/current_screen.xml", "w") as f:
f.write(page_source) 5. Testing Real Applications: Terminal, Notes, and Word
Testing system applications and productivity tools follows exactly the same pattern as Calculator, with the accessibility tree being the key to understanding what elements are available. Terminal is a useful example because it demonstrates how to interact with text input and read output, which covers a wide class of automation scenarios.
# pages/terminal_page.py
from appium.webdriver.common.appiumby import AppiumBy
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
class TerminalPage:
TERMINAL_WINDOW = (AppiumBy.IOS_CLASS_CHAIN, "**/XCUIElementTypeWindow[1]")
def __init__(self, driver):
self.driver = driver
self.wait = WebDriverWait(driver, 10)
def type_command(self, command: str):
window = self.wait.until(
EC.presence_of_element_located(self.TERMINAL_WINDOW)
)
window.click()
window.send_keys(command + "\n")
return self
def get_window_text(self) -> str:
window = self.driver.find_element(*self.TERMINAL_WINDOW)
return window.get_attribute("value") or window.text # tests/test_terminal.py
import time
from pages.terminal_page import TerminalPage
def test_echo_command_produces_output(driver):
terminal = TerminalPage(driver)
terminal.type_command("echo hello_appium")
time.sleep(1)
output = terminal.get_window_text()
assert "hello_appium" in output For Microsoft Word, the accessibility tree is rich and well-labelled because Word has mature accessibility support. You can locate the document body, toolbar buttons, and menu items reliably by their accessibility identifiers. A typical test might verify that a document opens, a specific heading is present, and that Save As completes without error:
# pages/word_page.py
from appium.webdriver.common.appiumby import AppiumBy
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class WordPage:
DOCUMENT_BODY = (AppiumBy.ACCESSIBILITY_ID, "Document Body")
SAVE_BUTTON = (AppiumBy.ACCESSIBILITY_ID, "Save")
def __init__(self, driver):
self.driver = driver
self.wait = WebDriverWait(driver, 15)
def body_is_present(self) -> bool:
return self.wait.until(
EC.presence_of_element_located(self.DOCUMENT_BODY)
).is_displayed()
def type_into_body(self, text: str):
body = self.wait.until(
EC.presence_of_element_located(self.DOCUMENT_BODY)
)
body.click()
body.send_keys(text)
return self
def save(self):
self.wait.until(EC.element_to_be_clickable(self.SAVE_BUTTON)).click()
return self Notes is another good target because it represents a class of applications where the primary interaction is text entry and retrieval, and the accessibility tree reliably exposes note content as readable text:
# conftest.py addition for Notes
@pytest.fixture(scope="function")
def notes_driver():
options = XCUITestOptions()
options.platform_name = "mac"
options.automation_name = "mac2"
options.bundle_id = "com.apple.Notes"
d = webdriver.Remote("http://localhost:4723", options=options)
yield d
d.quit() # tests/test_notes.py
from appium.webdriver.common.appiumby import AppiumBy
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def test_new_note_can_be_created(notes_driver):
wait = WebDriverWait(notes_driver, 10)
new_note_btn = wait.until(
EC.presence_of_element_located((AppiumBy.ACCESSIBILITY_ID, "New Note"))
)
new_note_btn.click()
editor = wait.until(
EC.presence_of_element_located((AppiumBy.ACCESSIBILITY_ID, "Note Body Text View"))
)
editor.send_keys("Automated note created by Appium")
content = editor.get_attribute("value")
assert "Automated note created by Appium" in content 6. AI-Powered Test Generation
The most practical use of AI in Appium testing is not autonomous application exploration but rather fast, context-aware code generation. You supply the accessibility tree and describe what you want to test, and the AI produces well-structured page objects and test cases in seconds rather than the hour it would take to write them from scratch.
The most powerful input you can give Claude is the raw accessibility tree XML from your application. Paste it directly into the conversation with a prompt like: “Here is the Appium page source XML for the Microsoft Word toolbar. Generate a Python Page Object class that covers the formatting buttons and the File menu, using accessibility IDs where available and predicate strings where not. Then write five pytest test cases that verify the bold, italic, and underline buttons toggle correctly.” Claude will read the element types, labels, and identifiers in the tree and make sensible decisions about which locators are stable rather than positional and fragile.
If you don’t have the element tree to hand, a screenshot is the next best input. Take one during a session with driver.save_screenshot("screen.png"), upload it, and describe what you want to test. The locators will be inferred rather than read from the real hierarchy so you’ll need to validate them in Appium Inspector before committing, but the class structure, fixture wiring, method signatures, and assertion patterns will all be correct and save you significant time.
7. Using Claude Code for Appium
Claude Code is Anthropic’s agentic coding tool that runs in your terminal. Unlike pasting questions into a chat interface and copying code back and forth, Claude Code has direct access to your filesystem, can run shell commands, read your existing test files, and edit them in place. For Appium testing this changes the workflow fundamentally: you describe what you want and Claude Code does the work in your actual project rather than in a chat window.
Start it from your project root:
cd your-appium-project
claude The quality of what Claude Code produces scales directly with the context it can read from your project. Before asking it to generate anything, make sure you have a README.md describing your application and testing conventions, a conftest.py with your driver fixture, at least one complete page object and test file as a style reference, and an element_trees/ directory with XML dumps from Appium Inspector for the screens you want to test. With those files present Claude Code reads them automatically and follows your conventions without needing them explained in every prompt.
Generating a new page object from an element tree becomes a single instruction:
> I've added element_trees/word_toolbar.xml from Appium Inspector.
> Create a new Page Object class in pages/word_page.py following
> the same pattern as pages/calculator_page.py. Use accessibility
> IDs where available. Writing tests for an existing page object is equally direct:
> Look at pages/word_page.py and write tests/test_word.py covering:
> document opens successfully, bold formatting toggles on and off,
> Save As dialog appears and can be dismissed. Use the session-scoped
> driver fixture from conftest.py. Debugging a failing test is where the agentic capability earns its keep. Rather than reading a log file and tracing the error back through the code manually, hand the problem to Claude Code:
> tests/test_word.py::test_bold_toggles is failing with
> NoSuchElementException on bold_button. The error output is in
> test_output.log. Read the log and the page object and tell me
> why the element isn't being found. Claude Code reads both files and identifies whether the problem is a timing issue, a locator that doesn’t match what the accessibility tree actually contains, or a state issue where the toolbar hasn’t rendered yet, then proposes a targeted fix. Running tests and iterating on failures in a single session is the most efficient pattern:
> Run pytest tests/test_word.py -v and if any tests fail, read the
> output and propose fixes. Show me the diff before applying anything. The “show me the diff first” instruction is worth keeping until you’re comfortable with the workflow because Claude Code can make coordinated changes across multiple files at once and you want visibility before they land. For refactoring work like replacing brittle XPath locators across an entire suite, Claude Code handles it in a single pass:
> Go through all page objects in pages/ and replace any XPath
> locators with predicate strings or class chains where possible.
> Add a comment explaining the syntax for any complex ones. 8. Common Gotchas
The single most important rule in Appium testing is to never mix implicit and explicit waits. Implicit waits tell Appium to poll for elements for a set duration on every find_element call. Explicit waits using WebDriverWait do the same but only for a specific condition. When both are active their interactions produce unpredictable timing that makes tests flaky in ways that are genuinely hard to diagnose. Disable implicit waits at session start and use WebDriverWait everywhere:
driver.implicitly_wait(0) Related to this is element staleness. macOS applications re-render views in response to state changes, and element references captured earlier in a test can become invalid. Don’t hold element objects across interactions. Find the element, use it immediately, and find it again if you need it a second time. WebDriverWait handles this naturally because it re-queries the accessibility tree on every poll cycle.
Accessibility permissions are a common source of confusing failures. If Appium can connect to your application but cannot find any elements, or if the element tree in Appium Inspector appears completely empty, the most likely cause is that Terminal does not have accessibility permissions. Go to System Settings > Privacy & Security > Accessibility and confirm your terminal application is listed and enabled. This needs to be done for the process that runs Appium tests, so if you use iTerm2 rather than the built-in Terminal, grant the permission to iTerm2.
Some macOS applications have partially or poorly implemented accessibility trees. Menu items are usually well exposed, toolbar buttons vary, and custom views built without accessibility in mind may be completely invisible to the Accessibility API. In these cases the predicate string strategy with a value search or a partial label match is often more resilient than relying on identifiers that may not exist:
# Fallback when accessibility IDs are absent or inconsistent
driver.find_element(AppiumBy.IOS_PREDICATE_STRING, "value CONTAINS 'Save' AND elementType == 9") If you’re on Zscaler, it can intercept localhost traffic in certain configurations. If your test client reports a connection error when trying to reach http://localhost:4723, Zscaler’s packet inspection may be treating local connections as external traffic. Adding localhost and 127.0.0.1 to the Zscaler bypass list resolves this.
Finally, don’t ask Claude Code to edit test files while pytest is actively reading them. Finish the test run first, then hand off to Claude Code. Conversely, if Claude Code is mid-way through a multi-file edit, don’t start a test run against a partially modified codebase. Both tools are fast enough that sequencing them adds no meaningful overhead.
9. A Complete Project Structure
A well-organised Appium project for macOS desktop testing should look like this:
my-mac-tests/
├── setup-appium.sh
├── conftest.py
├── pytest.ini
├── requirements.txt
├── README.md
│
├── pages/
│ ├── __init__.py
│ ├── base_page.py
│ ├── calculator_page.py
│ ├── notes_page.py
│ ├── terminal_page.py
│ └── word_page.py
│
├── tests/
│ ├── test_calculator.py
│ ├── test_notes.py
│ ├── test_terminal.py
│ └── test_word.py
│
├── test_data/
│ └── documents/
│
├── element_trees/
│ ├── calculator.xml
│ ├── notes_editor.xml
│ └── word_toolbar.xml
│
└── utils/
├── wait_helpers.py
└── screenshot.py 10. Where to Go from Here
Once you’re comfortable with the local workflow the logical next step is running your suite in CI. GitHub Actions supports macOS runners and the Appium server and mac2 driver run on them without any configuration changes beyond the same accessibility permission setup covered here. For more complex scenarios involving applications that require real user credentials or specific system state, pytest fixtures that handle setup and teardown at the session level keep that complexity isolated from the test logic itself.
The AI-assisted workflow covered here will continue to improve. The pattern of exporting accessibility trees, feeding them to Claude, and generating or refactoring test code is already high-value and reliable. As vision models improve, screenshot-to-test fidelity will increase to the point where the tree export step becomes optional for initial scaffolding. The fundamentals you’ve built here, Page Object Model, explicit waits, fixture-based driver management, and Claude Code integration, will remain the right foundation regardless of how the tooling evolves.