A Swift package for seeing and controlling a Mac screen.
import MacControlKit
let computer = MacComputer()
let status = await computer.requestPermissions()
guard status.canCaptureScreen && status.canControlInput else {
print("Screen Recording and Accessibility permissions are required.")
return
}
let screenshot = try await computer.screenshot()
print("Captured \(screenshot.size.width)x\(screenshot.size.height)")
try await computer.perform(.click(.init(x: 500, y: 500)))
try await computer.perform(.type("hello"))
try await computer.perform(.keyPress(key: "l", modifiers: [.command]))
try await computer.perform(.openURL(URL(string: "https://example.com")!))
try await computer.perform(.launchApp("Safari"))MacControlKit is the control layer for Mac apps, command-line tools, and experiments that need to operate the user's computer with permission.
At its core, it is a small wrapper around the macOS APIs you usually have to stitch together yourself:
- ScreenCaptureKit for screenshots.
- Accessibility and TCC permission checks.
- CoreGraphics events for mouse, keyboard, drag, and scroll.
- AppKit and LaunchServices for opening URLs and launching apps.
- Coordinate conversion for Retina displays and global screen coordinates.
MacControlKit does not plan tasks, talk to a model, run a cloud service, or ship a chat UI. It just gives you a clean way to capture the screen and send input.
Bring your own model, planner, UI, script, or test runner.
macOS already has the pieces, but the first few days of any "computer use" project tend to look the same:
- Figure out Screen Recording permission.
- Figure out Accessibility permission.
- Capture a screenshot.
- Convert screenshot coordinates into macOS screen points.
- Post mouse and keyboard events without getting tripped up by Retina scaling.
- Repeat the whole thing in the next app.
MacControlKit packages that foundation so app authors can start one layer higher.
MacControlKit supports macOS 15.2 and newer.
- Open your project in Xcode.
- Choose File > Add Package Dependencies.
- Enter
https://github.com/tzafon/MacControlKit.git. - Add the
MacControlKitproduct to your app target. - Set your target's macOS deployment target to 15.2 or newer.
- Import the package where you need it:
import MacControlKit// Package.swift
let package = Package(
name: "YourPackage",
platforms: [
.macOS("15.2")
],
dependencies: [
.package(url: "https://github.com/tzafon/MacControlKit.git", from: "0.1.0")
],
targets: [
.executableTarget(
name: "YourTool",
dependencies: [
.product(name: "MacControlKit", package: "MacControlKit")
]
)
]
)Then use it from your target:
import MacControlKit
let computer = MacComputer()
print(await computer.permissions())You can try MacControlKit without adding it to a Swift package first:
git clone https://github.com/tzafon/MacControlKit.git
cd MacControlKit
swift run maccontrol help
swift run maccontrol permissions
swift run maccontrol request-permissions
swift run maccontrol screenshot ./screen.jpgAfter Screen Recording and Accessibility permissions are granted, you can try input control:
swift run maccontrol click 500 500
swift run maccontrol type "hello"
swift run maccontrol key --modifier command l
swift run maccontrol open-url https://example.com
swift run maccontrol launch Safariswift run needs to be run from the package directory, where Package.swift lives.
For repeated local testing, build once and call the binary directly:
swift build
.build/debug/maccontrol permissions
.build/debug/maccontrol screenshot ./screen.jpgThe CLI is mostly executable documentation. It is useful for checking permissions, screenshots, and coordinates before embedding the library in your own app.
MacControlKit uses Swift DocC for API documentation and guide pages.
In Xcode, open the package and choose Product > Build Documentation. The docs
catalog lives at Sources/MacControlKit/MacControlKit.docc.
The GitHub Pages workflow publishes the rendered DocC site from main:
https://tzafon.github.io/MacControlKit/documentation/maccontrolkit/
The first guide pages cover:
- Getting started
- Permissions
- Coordinates
- Command-line usage
- Building agents on top
By default, MacControlKit uses a normalized 0-999 coordinate system:
(0, 0)is the top-left of the visible screen.(999, 999)is the bottom-right.(500, 500)is roughly the center.
try await computer.perform(.click(.init(x: 485, y: 30)))That means "about 48.5% from the left and 3% from the top", regardless of the user's display resolution or Retina scale.
This is useful for model-driven apps because the model can reason about the screenshot in one stable coordinate space. It is also useful for scripts because the same action can run on different screens.
If you need lower-level access, you can ask MacControlKit to convert normalized coordinates into macOS screen points:
let point = try computer.screenPoint(from: NormalizedPoint(x: 500, y: 500))MacControlKit uses normal macOS permission flows. It does not hide prompts or install privileged helpers.
Apps using this package usually need:
- Screen Recording permission to capture the screen.
- Accessibility permission to send mouse and keyboard input.
let status = await computer.permissions()
if !status.canCaptureScreen || !status.canControlInput {
let updated = await computer.requestPermissions()
print(updated)
}On macOS, Screen Recording permission may not fully apply until the host app is restarted. MacControlKit reports permission state, but your app decides how to explain that to users.
The main protocol is intentionally small:
public protocol ComputerControlling: Sendable {
func permissions() async -> PermissionStatus
func requestPermissions() async -> PermissionStatus
func screenshot() async throws -> Screenshot
func perform(_ action: ComputerAction) async throws
}The default implementation is MacComputer:
public final class MacComputer: ComputerControlling {
public init(options: MacComputerOptions = .default)
}Actions are plain Swift values:
public enum ComputerAction: Sendable, Hashable {
case click(NormalizedPoint)
case rightClick(NormalizedPoint)
case doubleClick(NormalizedPoint)
case drag(from: NormalizedPoint, to: NormalizedPoint)
case scroll(position: NormalizedPoint, delta: ScrollDelta)
case type(String)
case keyPress(key: String, modifiers: [KeyModifier])
case openURL(URL)
case launchApp(String)
}Screenshots are just data:
public struct Screenshot: Sendable, Hashable {
public let size: CGSize
public let data: Data
public let format: ScreenshotFormat
}MacControlKit is useful for AI apps, but it is not an AI framework.
An agent loop can use it like this:
let screenshot = try await computer.screenshot()
let action = try await model.nextAction(
screenshot: screenshot.data,
coordinateSystem: .normalizedTopLeft999
)
try await computer.perform(action)The loop, model, prompt, memory, safety policy, and stopping conditions are all application code. MacControlKit stays underneath that.
That boundary is the point. A test runner, a hand-written script, and a model-powered assistant should all be able to use the same control layer.
MacControlKit is not:
- An agent runtime.
- A browser automation framework.
- A Selenium or Playwright replacement.
- A Lua or Python scripting environment.
- An MCP server.
- A cloud API client.
- A prompt library.
- A semantic Accessibility tree wrapper.
Those are good things to build. They just belong above this package.
There is a lot of good work around desktop automation already:
- Hammerspoon is a powerful macOS automation app with a Lua runtime.
- cliclick is a focused macOS command-line tool for mouse and keyboard events.
- PyAutoGUI and nut.js provide cross-platform automation APIs.
- usecomputer, CUA, and macOS-MCP target computer-use agents directly.
MacControlKit is narrower than those projects. It is meant to be a native Swift package that app developers can embed directly, without adopting a scripting runtime, Node/Python dependency, MCP server, VM layer, or agent framework.
MacControlKit is available under the Apache License, Version 2.0. See LICENSE.