MacControlKit

A Swift package for seeing and controlling a Mac screen.

import MacControlKit

let computer = MacComputer()

let status = await computer.requestPermissions()
guard status.canCaptureScreen && status.canControlInput else {
    print("Screen Recording and Accessibility permissions are required.")
    return
}

let screenshot = try await computer.screenshot()
print("Captured \(screenshot.size.width)x\(screenshot.size.height)")

try await computer.perform(.click(.init(x: 500, y: 500)))
try await computer.perform(.type("hello"))
try await computer.perform(.keyPress(key: "l", modifiers: [.command]))
try await computer.perform(.openURL(URL(string: "https://example.com")!))
try await computer.perform(.launchApp("Safari"))

What Is MacControlKit?

MacControlKit is the control layer for Mac apps, command-line tools, and experiments that need to operate the user's computer with permission.

At its core, it is a small wrapper around the macOS APIs you usually have to stitch together yourself:

ScreenCaptureKit for screenshots.
Accessibility and TCC permission checks.
CoreGraphics events for mouse, keyboard, drag, and scroll.
AppKit and LaunchServices for opening URLs and launching apps.
Coordinate conversion for Retina displays and global screen coordinates.

MacControlKit does not plan tasks, talk to a model, run a cloud service, or ship a chat UI. It just gives you a clean way to capture the screen and send input.

Bring your own model, planner, UI, script, or test runner.

Why?

macOS already has the pieces, but the first few days of any "computer use" project tend to look the same:

Figure out Screen Recording permission.
Figure out Accessibility permission.
Capture a screenshot.
Convert screenshot coordinates into macOS screen points.
Post mouse and keyboard events without getting tripped up by Retina scaling.
Repeat the whole thing in the next app.

MacControlKit packages that foundation so app authors can start one layer higher.

Installation

MacControlKit supports macOS 15.2 and newer.

Xcode App

Open your project in Xcode.
Choose File > Add Package Dependencies.
Enter https://github.com/tzafon/MacControlKit.git.
Add the MacControlKit product to your app target.
Set your target's macOS deployment target to 15.2 or newer.
Import the package where you need it:

import MacControlKit

Swift Package

// Package.swift
let package = Package(
    name: "YourPackage",
    platforms: [
        .macOS("15.2")
    ],
    dependencies: [
        .package(url: "https://github.com/tzafon/MacControlKit.git", from: "0.1.0")
    ],
    targets: [
        .executableTarget(
            name: "YourTool",
            dependencies: [
                .product(name: "MacControlKit", package: "MacControlKit")
            ]
        )
    ]
)

Then use it from your target:

import MacControlKit

let computer = MacComputer()
print(await computer.permissions())

Try It

You can try MacControlKit without adding it to a Swift package first:

git clone https://github.com/tzafon/MacControlKit.git
cd MacControlKit

swift run maccontrol help
swift run maccontrol permissions
swift run maccontrol request-permissions
swift run maccontrol screenshot ./screen.jpg

After Screen Recording and Accessibility permissions are granted, you can try input control:

swift run maccontrol click 500 500
swift run maccontrol type "hello"
swift run maccontrol key --modifier command l
swift run maccontrol open-url https://example.com
swift run maccontrol launch Safari

swift run needs to be run from the package directory, where Package.swift lives.

For repeated local testing, build once and call the binary directly:

swift build
.build/debug/maccontrol permissions
.build/debug/maccontrol screenshot ./screen.jpg

The CLI is mostly executable documentation. It is useful for checking permissions, screenshots, and coordinates before embedding the library in your own app.

Documentation

MacControlKit uses Swift DocC for API documentation and guide pages.

In Xcode, open the package and choose Product > Build Documentation. The docs catalog lives at Sources/MacControlKit/MacControlKit.docc.

The GitHub Pages workflow publishes the rendered DocC site from main:

https://tzafon.github.io/MacControlKit/documentation/maccontrolkit/

The first guide pages cover:

Getting started
Permissions
Coordinates
Command-line usage
Building agents on top

Coordinates

By default, MacControlKit uses a normalized 0-999 coordinate system:

(0, 0) is the top-left of the visible screen.
(999, 999) is the bottom-right.
(500, 500) is roughly the center.

try await computer.perform(.click(.init(x: 485, y: 30)))

That means "about 48.5% from the left and 3% from the top", regardless of the user's display resolution or Retina scale.

This is useful for model-driven apps because the model can reason about the screenshot in one stable coordinate space. It is also useful for scripts because the same action can run on different screens.

If you need lower-level access, you can ask MacControlKit to convert normalized coordinates into macOS screen points:

let point = try computer.screenPoint(from: NormalizedPoint(x: 500, y: 500))

Permissions

MacControlKit uses normal macOS permission flows. It does not hide prompts or install privileged helpers.

Apps using this package usually need:

Screen Recording permission to capture the screen.
Accessibility permission to send mouse and keyboard input.

let status = await computer.permissions()

if !status.canCaptureScreen || !status.canControlInput {
    let updated = await computer.requestPermissions()
    print(updated)
}

On macOS, Screen Recording permission may not fully apply until the host app is restarted. MacControlKit reports permission state, but your app decides how to explain that to users.

API

The main protocol is intentionally small:

public protocol ComputerControlling: Sendable {
    func permissions() async -> PermissionStatus
    func requestPermissions() async -> PermissionStatus
    func screenshot() async throws -> Screenshot
    func perform(_ action: ComputerAction) async throws
}

The default implementation is MacComputer:

public final class MacComputer: ComputerControlling {
    public init(options: MacComputerOptions = .default)
}

Actions are plain Swift values:

public enum ComputerAction: Sendable, Hashable {
    case click(NormalizedPoint)
    case rightClick(NormalizedPoint)
    case doubleClick(NormalizedPoint)
    case drag(from: NormalizedPoint, to: NormalizedPoint)
    case scroll(position: NormalizedPoint, delta: ScrollDelta)
    case type(String)
    case keyPress(key: String, modifiers: [KeyModifier])
    case openURL(URL)
    case launchApp(String)
}

Screenshots are just data:

public struct Screenshot: Sendable, Hashable {
    public let size: CGSize
    public let data: Data
    public let format: ScreenshotFormat
}

Building Agents On Top

MacControlKit is useful for AI apps, but it is not an AI framework.

An agent loop can use it like this:

let screenshot = try await computer.screenshot()

let action = try await model.nextAction(
    screenshot: screenshot.data,
    coordinateSystem: .normalizedTopLeft999
)

try await computer.perform(action)

The loop, model, prompt, memory, safety policy, and stopping conditions are all application code. MacControlKit stays underneath that.

That boundary is the point. A test runner, a hand-written script, and a model-powered assistant should all be able to use the same control layer.

What This Is Not

MacControlKit is not:

An agent runtime.
A browser automation framework.
A Selenium or Playwright replacement.
A Lua or Python scripting environment.
An MCP server.
A cloud API client.
A prompt library.
A semantic Accessibility tree wrapper.

Those are good things to build. They just belong above this package.

Prior Art

There is a lot of good work around desktop automation already:

Hammerspoon is a powerful macOS automation app with a Lua runtime.
cliclick is a focused macOS command-line tool for mouse and keyboard events.
PyAutoGUI and nut.js provide cross-platform automation APIs.
usecomputer, CUA, and macOS-MCP target computer-use agents directly.

MacControlKit is narrower than those projects. It is meant to be a native Swift package that app developers can embed directly, without adopting a scripting runtime, Node/Python dependency, MCP server, VM layer, or agent framework.

License

MacControlKit is available under the Apache License, Version 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
Sources		Sources
Tests/MacControlKitTests		Tests/MacControlKitTests
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MacControlKit

What Is MacControlKit?

Why?

Installation

Xcode App

Swift Package

Try It

Documentation

Coordinates

Permissions

API

Building Agents On Top

What This Is Not

Prior Art

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MacControlKit

What Is MacControlKit?

Why?

Installation

Xcode App

Swift Package

Try It

Documentation

Coordinates

Permissions

API

Building Agents On Top

What This Is Not

Prior Art

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages