The availability of real world (e.g., routinely collected) data has allowed researchers to generate massive amounts of evidence on epidemiology, natural history, disease burden, and drug efficacy. However, very few studies conducted with these data use validated code algorithms to identify the study cohort, exposure, or control variables. Even when algorithms are validated, their performance is often suboptimal. Several research groups and government agencies have offered recommendations for when and how algorithms should be validated and how the results should be reported. Key learning objectives: - The majority of studies performed with real world data lack adequate algorithm validation. - Exposures and outcomes algorithms are often more important to validate than population identification algorithms. - Positive predictive value, while the most often reported validation statistic, may not be the most useful or important one - Validation of algorithms for rare conditions requires a different approach than for common ones. - Medical record review remains the only reliable validation method in most cases and cannot be reliably performed with artificial intelligence techniques. - Validation of code algorithms using accepted methods improves study quality and increases chance of publication acceptance at higher impact journals.