Skip to content

tzafon/MacControlKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MacControlKit

A Swift package for seeing and controlling a Mac screen.

import MacControlKit

let computer = MacComputer()

let status = await computer.requestPermissions()
guard status.canCaptureScreen && status.canControlInput else {
    print("Screen Recording and Accessibility permissions are required.")
    return
}

let screenshot = try await computer.screenshot()
print("Captured \(screenshot.size.width)x\(screenshot.size.height)")

try await computer.perform(.click(.init(x: 500, y: 500)))
try await computer.perform(.type("hello"))
try await computer.perform(.keyPress(key: "l", modifiers: [.command]))
try await computer.perform(.openURL(URL(string: "https://example.com")!))
try await computer.perform(.launchApp("Safari"))

What Is MacControlKit?

MacControlKit is the control layer for Mac apps, command-line tools, and experiments that need to operate the user's computer with permission.

At its core, it is a small wrapper around the macOS APIs you usually have to stitch together yourself:

  • ScreenCaptureKit for screenshots.
  • Accessibility and TCC permission checks.
  • CoreGraphics events for mouse, keyboard, drag, and scroll.
  • AppKit and LaunchServices for opening URLs and launching apps.
  • Coordinate conversion for Retina displays and global screen coordinates.

MacControlKit does not plan tasks, talk to a model, run a cloud service, or ship a chat UI. It just gives you a clean way to capture the screen and send input.

Bring your own model, planner, UI, script, or test runner.

Why?

macOS already has the pieces, but the first few days of any "computer use" project tend to look the same:

  1. Figure out Screen Recording permission.
  2. Figure out Accessibility permission.
  3. Capture a screenshot.
  4. Convert screenshot coordinates into macOS screen points.
  5. Post mouse and keyboard events without getting tripped up by Retina scaling.
  6. Repeat the whole thing in the next app.

MacControlKit packages that foundation so app authors can start one layer higher.

Installation

MacControlKit supports macOS 15.2 and newer.

Xcode App

  1. Open your project in Xcode.
  2. Choose File > Add Package Dependencies.
  3. Enter https://github.com/tzafon/MacControlKit.git.
  4. Add the MacControlKit product to your app target.
  5. Set your target's macOS deployment target to 15.2 or newer.
  6. Import the package where you need it:
import MacControlKit

Swift Package

// Package.swift
let package = Package(
    name: "YourPackage",
    platforms: [
        .macOS("15.2")
    ],
    dependencies: [
        .package(url: "https://github.com/tzafon/MacControlKit.git", from: "0.1.0")
    ],
    targets: [
        .executableTarget(
            name: "YourTool",
            dependencies: [
                .product(name: "MacControlKit", package: "MacControlKit")
            ]
        )
    ]
)

Then use it from your target:

import MacControlKit

let computer = MacComputer()
print(await computer.permissions())

Try It

You can try MacControlKit without adding it to a Swift package first:

git clone https://github.com/tzafon/MacControlKit.git
cd MacControlKit

swift run maccontrol help
swift run maccontrol permissions
swift run maccontrol request-permissions
swift run maccontrol screenshot ./screen.jpg

After Screen Recording and Accessibility permissions are granted, you can try input control:

swift run maccontrol click 500 500
swift run maccontrol type "hello"
swift run maccontrol key --modifier command l
swift run maccontrol open-url https://example.com
swift run maccontrol launch Safari

swift run needs to be run from the package directory, where Package.swift lives.

For repeated local testing, build once and call the binary directly:

swift build
.build/debug/maccontrol permissions
.build/debug/maccontrol screenshot ./screen.jpg

The CLI is mostly executable documentation. It is useful for checking permissions, screenshots, and coordinates before embedding the library in your own app.

Documentation

MacControlKit uses Swift DocC for API documentation and guide pages.

In Xcode, open the package and choose Product > Build Documentation. The docs catalog lives at Sources/MacControlKit/MacControlKit.docc.

The GitHub Pages workflow publishes the rendered DocC site from main:

https://tzafon.github.io/MacControlKit/documentation/maccontrolkit/

The first guide pages cover:

  • Getting started
  • Permissions
  • Coordinates
  • Command-line usage
  • Building agents on top

Coordinates

By default, MacControlKit uses a normalized 0-999 coordinate system:

  • (0, 0) is the top-left of the visible screen.
  • (999, 999) is the bottom-right.
  • (500, 500) is roughly the center.
try await computer.perform(.click(.init(x: 485, y: 30)))

That means "about 48.5% from the left and 3% from the top", regardless of the user's display resolution or Retina scale.

This is useful for model-driven apps because the model can reason about the screenshot in one stable coordinate space. It is also useful for scripts because the same action can run on different screens.

If you need lower-level access, you can ask MacControlKit to convert normalized coordinates into macOS screen points:

let point = try computer.screenPoint(from: NormalizedPoint(x: 500, y: 500))

Permissions

MacControlKit uses normal macOS permission flows. It does not hide prompts or install privileged helpers.

Apps using this package usually need:

  • Screen Recording permission to capture the screen.
  • Accessibility permission to send mouse and keyboard input.
let status = await computer.permissions()

if !status.canCaptureScreen || !status.canControlInput {
    let updated = await computer.requestPermissions()
    print(updated)
}

On macOS, Screen Recording permission may not fully apply until the host app is restarted. MacControlKit reports permission state, but your app decides how to explain that to users.

API

The main protocol is intentionally small:

public protocol ComputerControlling: Sendable {
    func permissions() async -> PermissionStatus
    func requestPermissions() async -> PermissionStatus
    func screenshot() async throws -> Screenshot
    func perform(_ action: ComputerAction) async throws
}

The default implementation is MacComputer:

public final class MacComputer: ComputerControlling {
    public init(options: MacComputerOptions = .default)
}

Actions are plain Swift values:

public enum ComputerAction: Sendable, Hashable {
    case click(NormalizedPoint)
    case rightClick(NormalizedPoint)
    case doubleClick(NormalizedPoint)
    case drag(from: NormalizedPoint, to: NormalizedPoint)
    case scroll(position: NormalizedPoint, delta: ScrollDelta)
    case type(String)
    case keyPress(key: String, modifiers: [KeyModifier])
    case openURL(URL)
    case launchApp(String)
}

Screenshots are just data:

public struct Screenshot: Sendable, Hashable {
    public let size: CGSize
    public let data: Data
    public let format: ScreenshotFormat
}

Building Agents On Top

MacControlKit is useful for AI apps, but it is not an AI framework.

An agent loop can use it like this:

let screenshot = try await computer.screenshot()

let action = try await model.nextAction(
    screenshot: screenshot.data,
    coordinateSystem: .normalizedTopLeft999
)

try await computer.perform(action)

The loop, model, prompt, memory, safety policy, and stopping conditions are all application code. MacControlKit stays underneath that.

That boundary is the point. A test runner, a hand-written script, and a model-powered assistant should all be able to use the same control layer.

What This Is Not

MacControlKit is not:

  • An agent runtime.
  • A browser automation framework.
  • A Selenium or Playwright replacement.
  • A Lua or Python scripting environment.
  • An MCP server.
  • A cloud API client.
  • A prompt library.
  • A semantic Accessibility tree wrapper.

Those are good things to build. They just belong above this package.

Prior Art

There is a lot of good work around desktop automation already:

MacControlKit is narrower than those projects. It is meant to be a native Swift package that app developers can embed directly, without adopting a scripting runtime, Node/Python dependency, MCP server, VM layer, or agent framework.

License

MacControlKit is available under the Apache License, Version 2.0. See LICENSE.

About

A Swift package for seeing and controlling a Mac screen.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages