While SAS is still widely respected and used, Python is increasingly becoming the language of choice owing to its open-source and easy to learn nature, as well as its readily available machine learning and artificial intelligence libraries. Based on my experiences using SAS, R, Python, and SQL languages on various client projects and my work developing a SAS-Python automatic conversion tool, I found it would be extremely useful for data analysts and SAS programmers to have a comprehensive list of resources and guides when working on translating SAS code to Python script.
SAS is a powerful tool in data analysis, statistical model development, and data science in general and has been a leading analytic software since 1980s. Many corporations, research institutions, and government agencies have been using SAS as their main analytic tool for years. The SAS DATA step is arguably the most powerful feature in the SAS language, with the ability to union, join, filter, add, remove, and modify columns, along with plainly express conditional and looping business logic. CMS has been one of the major users of SAS software and continues to heavily rely on SAS for data analytics and model development. That is one of the reasons that I have used DE-SynPUFs as raw input for this analysis.
DE-SynPUF was created by CMS with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information.
The purposes of the DE-SynPUF are to:
- Allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data;
- Train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and,
- Support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy.
The files have been designed so that programs and procedures created on the DE-SynPUF will function on CMS Limited Data Sets. The data structure of the Medicare DE-SynPUF is very similar to the CMS Limited Data Sets, but with a smaller number of variables. The DE-SynPUF also provides a robust set of metadata on the CMS claims data that have not been previously available in the public domain. Although the DE-SynPUF has very limited inferential research value to draw conclusions about Medicare beneficiaries due to the synthetic processes used to create the file, the Medicare DE-SynPUF does increase access to a realistic Medicare claims data file in a timely and less expensive manner to spur the innovation necessary to achieve the goals of better care for beneficiaries and improve the health of the population.
DE-SynPUF contains five types of data:
- Beneficiary Summary
- Inpatient Claims
- Outpatient Claims
- Carrier Claims
- Prescription Drug Events
DE-SynPUFs and documentation can be downloaded from: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.
SAS Codes and Python Scripts
Using CMS DE-SynPUF as input datasets, I worked out this long list of Python scripts along with equivalent (if any) SAS codes to demonstrate that most of the routine data manipulation steps, SAS data steps, and frequently used PROCs can be relatively easily translated to Python scripts.
I have included SAS codes along with Python scripts in every block; however, you don't need to use SAS here. I don't want to discourage anyone who doesn't know Python. Python is indeed a very easy to learn language.
To run the scripts here, all you need is:
- Install Python to your local environment or open a free https://www.kaggle.com/ account.
- Learn the basics of Python from https://www.w3schools.com/python/default.asp free of charge.
- Know what and how to install and import Python libraries.
- Know at least one SQL language. I have used a lot of SQL examples here because it is being widely used.
- Most importantly: Willingness and dedication to learn Python.
Download the complete Python code here: Medicare Claims Data Analysis Script
View the list of script examples here (PDF): Medicare Claims Synthetic Public Use Files (SynPUFs) Analysis