This paper addresses the problem of the dynamic scheduling of data-intensive multiprocessor jobs. Each job requires some number of CPUs and some amount of data that needs to be downloaded into a local storage. The completion of each job brings some benefit (utility) to the system, and the goal is to find the optimal scheduling policy that maximizes the average utility per unit of time obtained from all completed jobs. A co-evolutionary solution methodology is proposed, where the utility-based policies for managing local storage and for scheduling jobs onto the available CPUs mutually affect each other’s environments, with both policies being adaptively tuned using the Reinforcement Learning (RL) methodology. The simulation results demonstrate that the performance of the scheduling policies increases significantly as a result of being tuned with RL, to the point that they significantly outperform the best scheduling algorithm suggested in the literature for jobs with soft-deadline utility functions.