Could computer windows organize themselves?
Sep 21 2020
Personal computing has grown by leaps and bounds in power, accessibility, and social prominence since the introduction of the first successful consumer-grade computer in 1974. Despite the emergence of new form factors, input methods, and even entirely new interaction paradigms in smartphones, tablets, and VR/AR, traditional computers continue to rely on an interface metaphor birthed in the early days of the personal computing movement: the desktop.
Though undoubtedly robust, the windowed desktop is not without its weaknesses. Our computational demands have reached a point where we can have dozens on dozens of windows and even more tabs open at any given time, and traditional desktops require us to position, resize, and select these windows with a cursor. There are several solutions provided by modern operating systems to reduce the cognitive load and monotony associated with window management, but these tools do not solve the core problems of the modern desktop model.
This limitation grows even more problematic when we consider that window layouts represent not only spacial arrangements but also semantic and functional ones. Computer "multi-tasking" is, in many circumstances, a misnomer; when we multi-task (or, more accurately, multi-app), we are often less concerned with doing two things at once than we are with doing a complex task made up of multiple components, linked in a complex non-linear flow. As a result, managing windows on a very low-level as we do now can feel like we are fighting with our computers to get work done.
At the same time, multi-touch has emerged as one of the most promising and rapidly evolving input technologies of the past decade. Companies like Apple and Microsoft have explored building hybrid, touch-driven PCs to bring this technology to more conventional computing experiences. The Microsoft Surface, arguably the first mainstream hybrid computer, is functionally a traditional laptop with multi-touch provided as a secondary input method. In contrast, the iPad Pro, Apple's flagship iPad, is first and foremost a tablet that integrates input methods from conventional laptops.
These devices are distinguished not by their form factors—which are quite similar—but instead by their software and interfaces. The Surface runs Windows, a traditional desktop OS built around cursor input, while the iPad runs iPadOS, an evolution of an operating system built for touchscreens. Each product straddles the line between a traditional desktop and a tablet, and each falls short of merging the two.
Current interface metaphors function better with specific modes of input over others; the desktop was designed around keyboard and mouse use, and touch interfaces were designed around multi-touch. Building an interface for multi-modal input requires a wholly different interface paradigm, built from the ground up to take advantage of a wide range of mixed interaction styles. No product currently on the market has been able to make good on a genuinely multi-modal approach.
In 2018, I began to develop a new interface concept that would address the window management issues of the traditional desktop and serve as a unifying metaphor for interaction across devices in both the conventional and hybrid computing space. My objective was to produce a near-to-market concept that could readily take advantage of existing hardware and technology while introducing a more ambitious rethinking of the modern desktop. I call it Carousel.
Spatial organization is a crucial part of getting work done. We work best when our tools are within reach and configured in a meaningful way; arrangements in physical space provide powerful psychological and semantic cues that help us better retain information and place ourselves in the right frame of mind. This concept applies at multiple scales: commuting to and from work, for example, creates not only physical distance but psychological distance between your home and work life. A considered work environment is tailored to the task at hand and sets aside extraneous distractions and concerns.
By contrast, disorganized spaces can get in the way of immersing ourselves in our work. Stacks of papers, misplaced keys, a browser window with some tabs you forgot to close: these are the just some of the things that can cause friction when getting to work. Finding the things we need—and setting aside the things we don't—is the foundation to creating a greater degree of focus and intention.
With the desktop metaphor, our computers have brought with them few of the physical desk's virtues and many of its worst vices. A messy desktop is the digital equivalent of a desk covered in a mess of papers: this similarity is, unfortunately, by design.
Desktops allow users to drag overlapping windows to specific X-Y positions on a flat plane. Why? Is there any meaningful difference to moving a window two pixels to the left or right? If not, why do our computers give us this degree of control over our windows—and by extension, saddle us with this degree of tedium?
Carousel provides a better way to build and manage meaningful digital workspaces that takes cues from how we construct productive environments in the physical world. It frees the user's attention and cognitive bandwidth from the arbitrary friction of a traditional overlapping desktop and works with the user to provide the right tools and context for focusing on the task at hand.
Carousel is made up of four components, built in three layers. Together, these building blocks form a different approach to the conventional computing experience—one that maps more closely to how we actually work.
On a fundamental level, we arrange windows to establish relationships between tools—conventionally in the form of applications—and various bodies of information. We put a PDF next to our word processor to reference an author's work while writing a paper; we move files from one browser window to another to archive them for safekeeping; we add photos to a design to include them in a publication. On a conventional desktop, managing these connections requires repeatedly dragging and resizing windows to configure them into a usable view.
Carousel streamlines the creation and management of these spatial-informational relationships through tiling window management (TWM). A tiling window manager does not allow for windows to overlap; instead, it automatically divides the entire screen area amongst windows in the current workspace. Unlike a standard desktop, users do not have to worry about manually moving and resizing window positions on an arbitrary X-Y plane. A user instead directly defines higher-level relationships between windows, like the relative orientation of one window to another and the proportion of screen area taken up by each. Windows in Carousel are movable in ways relevant to actually getting work done and not movable in ways that are not.
Further, Carousel utilizes a binary space partitioning model (BSP) to drive its initial window spawning behavior. In BSP, new windows spawn in by taking up half of the space occupied by the currently active window. This procedure provides a predictable default that streamlines the initial window launching process; should a user require a different setup, they can easily and quickly rearrange application windows into their desired relationship.
Through this approach, Carousel is able to represent all window arrangements it produces with a tree. Modifications to the window layout will alter the tree, and changing the tree will modify the layout. This infrastructure provides a framework for controlling, streamlining, and automating complex window arrangements.
Atop this base is the interaction layer, which establishes an intelligible relationship between Carousel's window managment infrastructure and the user. It is made up of two related components: Carousel's card UI and its input command system.
Carousel utilizes a card metaphor to represent windows and describe their behavior. Cards behave analogously to their real-life counterparts: they can be placed, rearranged, and stacked. These metaphorical actions are brought to life in Carousel through a consistent vocabulary of motion, designed to build a user's intuition through repeated exposure.
This card interface works in parallel with a command system designed to accept a wide range of input combinations. Commands consist of three elements: a primary modifier, a secondary modifier, and an exit key.
Primary modifiers declare the scope of action—that is, whether a command is being directed to Carousel (Control) or the active window (Command). Secondary modifiers distinguish commands with the same exit key, and exit keys complete a command.
All commands within Carousel are mapped consistently under this paradigm. They can be accessed using just a keyboard or in tandem with another input device, like a mouse, trackpad, or touchscreen.
Carousel's command system brings clarity to the often inconsistent structure of traditional desktop shortcuts by mirroring the hierarchy of on-screen information. This hierarchy also applies to multi-touch: three to four finger commands are used for window management, while one to two fingers execute in-app commands, like scrolling, zooming, and panning. This approach resolves much of the confusion with touch commands on devices like the iPad, which attempt to distinguish one command from another by factors that can be difficult to predict or engage reliably, such as with timing, velocity or touch zones.
Working in Carousel begins with a blank workspace mapped to the entire screen. To launch an application or file, press Control + Space and type into the action bar. The application will launch and take up all available screen space relative to its launch point. Because we have a blank workspace with no other windows, it automatically fills the entire screen.
To create a side-by-side layout, we repeat this process. Carousel will automatically split the space between the previously active application and your new one. Repeating once more creates a three-window layout.
Through one straightforward interaction pattern, Carousel enables users to create a usable and neatly organized three-window layout, with no wasted space or overlap.
In the previous example, we relied on Carousel's default window spawning behavior to automatically introduce and arrange windows. While more than adequate for the vast majority of use cases, there are times when greater control over how windows spawn is needed.
Let's look at how to create a two-app layout with one application below the other rather than side-by-side. With one window already open, press Control + Option + Up and enter the name of what you want to launch. On a trackpad or touchscreen, hold Control and swipe up with three fingers.
In the previous two examples, we were working with a small number of windows. But what if we begin to introduce more? This is where Carousel's stacking functionality provides a powerful tool for window management.
Carousel has a universal tab system that allows users to create a tabbed view of as many applications as they desire. As with all commands we've explored previously, stacking can be accessed from whatever input combination you have available. With a keyboard, press Control + S to stack all windows at or below the active window's current level in the layout tree. You can continue adding to a stack until all windows in the workspace have been grouped. To stack all windows at once, press Control + Option + S.
To stack windows with touch input, hold Control while pinching in with four fingers. Holding Control and Option while pinching will stack all windows at once.
Once a stack has been created, you can launch new applications directly as a new tab in the stack by pressing Control + T, or by pressing the plus button when hovering over the tab stack. Command + T launches a new tab of the current app.
The power of Carousel's window management approach is fully realized with the addition of its final component: workspace management. Like the layers before it, workspace management is designed to allow users to easily craft relationships between tools and bodies of information, but this time at a level above a single fullscreen view. You can view workspaces in context and by relationship—both through automated defaults and user-created connections.
In this view, Carousel provides a summary of both your active workspaces as well as your workspace history. Because Carousel is tree-based, it can easily recreate past spaces on demand. This view also allows you to create groups: collections of individual workspaces dedicated to a larger and more complex task, project, or idea. Once created, you can summon and dismiss a group just like any workspace.
In a final level above groups, Carousel allows you to create contexts: a superset of groups and workspaces dedicated to a broad area of responsibility that can be assigned to a physical location using GPS or near-field data. A context has its own history and is designed to create a layer of high-level separation between various areas of your life.
If a user assigns a location and gives Carousel permission, a context can be automatically switched to when the device is brought to that specific location. You can maintain separate work and personal life contexts, for example, and have Carousel switch between them when it realizes you have moved to a different space. In this way, Carousel's digital relationships can map onto spatial relationships in the physical world.
This final layer combines with the previous three to create a flexible digital work environment that scales and shifts with your needs.
Carousel was born out of frustration with the current state of computer interfaces. Relative to the pace of innovation in modern hardware, it felt out of place for me to see new form factors united with UX concepts that held them back from their full potential.
Computation brims with tantalizing possibilities of what could be: a kind of power that borders on magical—and yet we often remain content to live in its past. What if we imagined differently? How could we build new, better ways of relating to our digital lives?
Carousel was, for me, an attempt to explore a small part of these larger questions. I hope that what I have presented is a representative, if abridged, cross-section of the core thinking behind the concept.
I want to continue developing Carousel as a concept, ideally at a higher fidelity and in greater detail. The ultimate goal would be to bring something like it to life at the heart of a real device. On that path, there are many more ideas to explore and assumptions to question.
I owe a great debt of gratitude to those who inspired me to take on this challenge—the designers, artists, engineers, and thinkers who have, at one time or another, considered these questions and produced interesting and provoking answers. I hope this concept inspires conversation and curiosity about the future of computing, as the work of others has done for me. For my part, I will continue to explore the possibilities of what lay ahead.
- Some of the concepts and products that initially inspired Carousel are Clayton Miller's 10GUI, Ivan Sutherland's Sketchpad, The Soulmen's Daedalus Touch, Microsoft's Courier Concept, and the venerable i3. Revisiting the concept in the present time, I am reminded of the ideas and ambitions presented by Jason Yuan's Mercury OS and Soonho Kwon's Sessions, which have tackled similar challenges from varying scales and perspectives.↫
- Interface metaphors offer power by mimicking something we see in everyday life and tapping into our understanding of how those objects behave. Merely establishing a metaphor, however, does not give value or guidance in itself: it must be consistent and thought through. A metaphor can do more harm than good if it implies particular interaction patterns that contradict or otherwise don't align with what an interface is actually capable of. ↫
- Carousel was built on the assumption that at least some form of keyboard input would be available on the device running it. Adapting the interface for touch-only input would not be overly difficult, however; hardware substitutes for the two primary modifier keys and a touch-only iteration of the application launcher would be the key elements required. For the former, I will simply say that those who truly care about the software experience will build their own hardware. ↫
- I conducted user testing of Carousel's basic interaction patterns using a barebones web prototype, which mapped multi-touch commands to on-screen window spawning behaviors. Even with a bare minimum of visual cues, all test users were able to quickly grasp the interface with minimal instruction and create complex test arrangements with speed and proficiency. ↫
- Carousel is designed to be understood through play: it provides users with a modular set of interactions and rules and invites them to explore different combinations and expressions with these elements. ↫