R vs Python for Data Science

Ivy Professional School
7 min readJan 10, 2020
R vs Python for Data Science

There are 100s of programming languages existing today like Java, C, C++, Unix, Python, R, etc. Many have got obsolete while others are being invented rapidly or observe a variation in demand depending upon the usage over the course of the past several decades. The last couple of years has witnessed a surge in demand for Data Science roles. As a result, the open-source programming languages like R and Python have bubbled up as two highly liked programming language by the developers and the employers alike. In this article R vs Python for Data Science, you will get to know more about the two languages starting from their definitions to the job opportunities and more.

What is R?

R is a language for statistical computing, data analysis, and graphics. It was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland in the year 1991. R was built as a dialect of an earlier programming language S.

What is Python?

Python is an object-oriented programming language widely used for almost everything today. Guido van Rossum designed it in 1991 and it was developed by Python Software Foundation. Python succeeded in the earlier programming language ABC.

There are various steps in the life cycle of Data Science like Exploratory Data Science, Data Visualization, Statistical Modeling, Data Cleaning, etc. Let us take a look at what best do R and Python provide in a life cycle to become a favorite language for Data Science related tasks.

Why R in Data Science:

As already mentioned, R is a language which aims at statistical modeling and data analysis. Aspirants from Mathematics or Statistics background can find this language easy to learn. The latest version is R 3.6.2 released around mid-December last year. Being an open-source and liked by many across the world, there is a steep rise in the number of packages contributed and at present, there are 15000+ packages to pick and work from. The below provides a few good libraries all should know.

Data Loading from various sources — RMySQL, haven, xlsx, odbc to load data from any database, or excel or SPSS file, etc.

Data manipulation and wrangling — dplyr, tidyr, stringr, lubridate to manipulate strings or date and time, etc.

Visualization of Data — The most popular is ggplot2. Esquisse is an enhancement bringing Tableau to R.

Data Modeling — Various packages like Random Forest, Caret, rpart for applying various Machine Learning models.

Natural Language Processing — OpenNLP provides various functions like Named Entity Recognition, Topic Modeling, etc.

Additionally, there is a shiny package that helps easily make interactive, web apps with R. You can find more about various packages here. We also have R IDEs like R Studio and R GUI.

Why Python in Data Science:

Programming with Python is easy for someone with an IT background. It is very attractive to beginner programmers wanting to step into the world of Data Science because the language is multifaceted, flexible and has easy readability. Just like R, Python regularly releases its new version. The latest is version 3.8.1 launched around the mid of December last year. It has got a large community of developers who continuously contribute to the libraries and we can connect Python with several other tools and perform even complex tasks very easily. The various popular libraries related to Data Science and Machine Learning tasks are as follows.

Data Loading from various sources — Pyreadstat, Pandas to load data from excel, spss, sas files.

Data manipulation and wrangling — Numpy, Pandas can allow a practitioner to play with data.

Visualization of Data — Seaborn, Matplotlib are some excellent libraries for generating attractive visualizations.

Data Modeling — scikit learn, XGBoost for applying various Machine Learning models.

Natural Language Processing — NLTK, TextBlob, Gensim are some very useful libraries to do wonders in NLP.

Similar to RShiny, we have Dash in Python to build analytical web applications. The Python IDEs are IPython, Jupyter Notebook, Spyder. The most remarkable thing about Python is its Package Index, PyPI. This site is available thousands of projects along with its code which is available for reuse.

R vs Python Features

Big companies using these languages:

Big companies have got the availability of big data with them and they put this to use by generating interesting insights for better decision making. As a matter of fact, Google uses R for finding advertising effectiveness and economic forecasting while Facebook uses it for behavior analysis related to status updates and profile pictures. The below image provides some global companies which use these languages by large.

Companies using R and Python

The popularity of R vs Python Globally, in the US and India:

Python facts

Python glamorously stays well ahead of R in terms of popularity today. This probably attributes to its wide variety of usage by integrating with other tools and software as well as the flexibility and ease of programming it provides. We are going to see some popularity graphs on programming languages based on PYPL Index. (The PYPL Popularity of Programming Language Index is created by analyzing how often language tutorials are searched on Google. One can look for updates once a month.) The worldwide ranking shows that Python leads the table with a rise in Google searches as of November 2019. R goes down in popularity by .2%.

PYPL Index

Now let us look at the performance of the two languages in the US and India. R is more popular in the US as compared to India, while Python sits at the top in these regions as well.

PYPL Comparison US vs India

Forums and Blogs:

There are innumerable blogs from which one can learn about these programming languages. Below is the list of some blogs and forums related to R and Python.

R Bloggers: This blog includes jobs related to R across the globe and is a very active one.

Revolution Analytics Blog: This blog is regularly updated with the latest and fresh development work.

Stack Overflow: A very famous forum for programmers and developers which needs no introduction.

Nabble: Name might seem a bit off statistics or R but its a very active forum to post your doubts regarding R or data analysis and you can expect prompt responses.

Python Forum: The forum in the Python community.

Reddit: It is a widely used forum for discussing several programming languages including Python.

Planet Python: A very easy to understand and highly comprehensive blog on Python.

Mouse vs. Python: This blog provides you with everything the latest that is happening in the Python world.

Job opportunities:

India is not only the second-biggest analytics jobs hub after the US but also accounts for one in 10 advanced analytics job openings in the world. According to the Hindu (Feb 2019), there were 97,000 Data Science jobs vacant in India. AnalyticsIndiaMag conducted a survey where they took opinions from three groups of respondents — hiring managers, job seekers, and students — to get a thorough idea of the hiring scenario in this swiftly-developing area. A majority of them favored Python over R to land a job as a Data Scientist as shown in the graph below.

Popularity of Python

Various Job Roles for R and Python Programmers:

job roles for R and Python

R vs Python — which one to choose for Data Science:

You are probably much clear about the two programming languages now in comparison to when you stumbled upon this page. In the first place, you might feel inclined to Python by knowing about the attraction it is generating globally. However, on a case to case basis, an individual must think about the below questions to make an informed decision.

1) Your background –

Someone from the Mathematics/Statistics background would find inclination towards R as it’s robust in statistical modeling. If the person is from an IT background, he or she might prefer Python. However, if the person is from any other background, chances are that they will find Python easy to learn and follow.

2) Time to learn –

Some say the learning curve of R is steep while Python is easy to learn and code. Hence it depends on the readiness of any new user about what he or she might prefer.

3) What problem are you solving –

A good understanding of the problem will generate some idea of which language to start with. For problems of data scraping from websites, Python libraries like scrapy, beautiful soup, selenium give amazing work experience and good speed to a solution. If the problem requires intensive statistics, R can be your choice.

How to learn R or Python –

If you are data-driven and feeling motivated to learn these programming languages then check our courses at www.ivyproschool.com or call us on 7676882222 to know more. We at Ivy provide top class training for R and Python by our hand-picked industry experts. Our Teaching Assistants ensure hands-on experience with many current industry-relevant Data Science projects using R and Python. Visit A Beginner’s Guide To Data Science to know more about Data Science.

Check our grossing courses: https://ivyproschool.com/growwith/

--

--

Ivy Professional School

Ivy® Professional School is the official and authorized learning partner of some of the biggest corporate houses in India in the field of Analytics.