People can become highly proficient problem solvers in many complex domains, yet we lack clear accounts of what distinguishes expert from novice reasoning, how expertise develops over time, and whether it generalizes across tasks. This gap is partly methodological: most studies of human problem-solving rely on brief laboratory tasks with extrinsic incentives and coarse outcome measures, limiting insights into the cognitive processes underlying reasoning and expertise development. Longitudinal, self-motivated gameplay—combined with detailed process-tracing—can help fill these gaps. We introduce mitpuzzles.com, a public platform hosting a suite of constraint-based logic puzzles (e.g., Minesweeper, Sudoku, and Nonograms), instrumented to collect detailed mouse-tracking and other behavioral data. Using large-scale longitudinal data, we analyze how subproblem selection, accuracy, errors, and attentional allocation change with experience. We find effects of learning on search efficiency and accuracy, as well as individual differences suggesting that expert players are more likely to focus on difficult but more informative cells. These results highlight the promise of self-directed gameplay as a window into the development and structure of human expertise.