Core pattern
Upload 3–5 screenshots for visual reference, write a 100–200 word text layer describing app logic, screens, and data. The screenshot answers “what should it look like?” The text answers “what should it do?”
When to use a screenshot vs text
| Dimension | Screenshot | Text |
|---|---|---|
| Visual layout | Best | Poor |
| Color palette | Best | Moderate |
| Typography feel | Best | Poor |
| App logic | Poor | Best |
| Data shape | Poor | Best |
| Navigation structure | Moderate | Best |
| Native features | Poor | Best |
| Overall tone | Best | Moderate |
The hybrid workflow step by step
- Collect 3–5 reference screenshots for the key screens of your app. Figma exports, competitor screenshots, or hand-drawn mockups.
- Annotate key screenshots in Figma or Preview — boxes around areas to change, callouts for specific instructions.
- Write a 100–200 word text layer covering screens, data, stack, and visual tone. The screenshot carries the rest.
- Upload screenshots + paste text into your AI builder (screenshot-to-app).
- Generate the first pass, review the preview, iterate with surgical text prompts.
What to include in the text layer
When you have screenshots, the text prompt gets shorter but more pointed:
Attached screenshots are the visual reference for [APP NAME]. App type: [niche] app for [audience]. Match the visual style of the attached screenshots: typography, spacing, color palette, component style. Screens (Expo Router, [nav type]): - [Screen 1]: [primary job] - [Screen 2]: [primary job] - [Screen 3]: [primary job] Data (Supabase + RLS): - [table_1] (fields) - [table_2] (fields) Native features: [push / biometric / health] Changes from attached screenshots: - [specific change, e.g., "replace the map on screen 2 with a list"] - [any other modifications]
The text carries the logic; the screenshot carries the visual. Neither duplicates the other.
Preparing screenshots that steer output
- Resolution: 1x–2x device resolution. Blurry screenshots confuse the vision model.
- Clean crops: no screenshots of screenshots, no device frame bezels. Crop to the app content.
- One screen per image: do not stitch multiple screens into one image. Upload separately.
- Annotations help: Figma comments, arrows, sticky notes with specific instructions. AI vision models read them.
- Remove clutter: screenshots with personal data (emails, phone numbers) bleed into output. Blur or replace.
Iterating after the first pass
After the first generation, keep iterating with text prompts — no need to re-upload screenshots for most changes. Pattern:
- “On Home, match the card spacing in the second screenshot more closely.”
- “Change the accent color to match the orange in the reference image.”
- “The header in the third screenshot has a gradient — apply that to Home.”
- “The icon style in my reference is outlined, not filled.”
Reference numbered screenshots (“the second one”) so the AI knows which to attend to.
Common mistakes
- Uploading a screenshot and writing no text. You get the visual tone but no logic — app will be hollow.
- Uploading 15 screenshots. The model starts losing attention past 5.
- Screenshots that contradict the text. Text will usually win — align them.
- Submitting competitors’ apps to the store. Reference, don’t clone.
- Forgetting to specify the stack in text. Screenshots say nothing about React Native vs web.
The faster path
Try the hybrid workflow in ShipNative’s screenshot-to-app. Upload your visual reference, paste the text layer, preview in seconds. For the text layer template, use the scaffold prompt from Prompt Engineering for Mobile Apps: A Founder’s Playbook.