We provide IT Staff Augmentation Services!

Data Analytics Developer/python Developer Resume

3.00/5 (Submit Your Rating)

Piscataway New, JerseY

PROFESSIONAL SUMMARY:

  • IT Professional, offering over 6 years of performance excellence in which 2+ years in Python Development.
  • Experienced in developing Application softwares using Python.
  • Experience in analyzing data and creating data pipelines using Python, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
  • Experience working in Telecommunication, Healthcare, Education.
  • Data Discovery and Data Exploration of high dimensional and high - volume data by using Cosmos and Azure Analytics.
  • Sound knowledge in Data Quality & Data Governance practices & processes.
  • Good experience in developing web applications implementing Model View Control (MVC) architecture using Django, Flask and Python web application frameworks.
  • Experience working with Telecom, Helathcare and Educational data.
  • Certified as Azure Data Scientist Associate by Microsoft.
  • Extensive experience in Big Data Cluster, Amazon Web Services (EC2, S3, RDS, Elastic Load Balancing, MQ, Lambdas, SQS, IAM, Cloud Watch, EBS and CloudFormation).
  • Proficient in Postgres, Terradata, Hive, DB2, SQLite, MySQL and other SQL databases with Python.
  • Experienced in working with various Python IDE’s using PyCharm, PyScripter, Notebook, Spyder, Studio code, IDLE, NetBeans and Sublime Text.
  • Experience with Requests, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, Data Frame and Pandas python libraries during development lifecycle.
  • Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, DynamoDB by installing and configuring various packages in python.
  • Extensive experience in ETL (Extraction, Transformation and Loading) of data using Informatica Power Center from heterogeneous sources like Flat files (Fixed width, delimited), XML files, Relational databases.
  • Strong understanding of Dimensional Modeling, Star, Snowflake Schema, OLAP and DW concepts.
  • Strong ability to conduct qualitative and quantitative analysis for effective data-driven decision making.
  • Conducted ad-hoc data analysis on large datasets from multiple data sources to provide data insights and actionable advice to support business leaders according to self-service BI goals.
  • Experience in data preprocessing, data analysis, machine learning to get insights into structured and unstructured data.
  • Good Knowledge in writing different kinds of tests like Unit test/Pytest and build them.
  • Experienced with version control systems like Gitlab, GitHub to keep the versions and configurations of the code organized.
  • Developed python scripts to automate Data Analysis.
  • Experience in working with number of public and private cloud platforms like Amazon Web Services (AWS), Microsoft Azure, Rackspace Cloud and Openstack.
  • Extensive experience in Amazon Web Services (Amazon EC2, Amazon S3, Amazon Simple DB, Amazon RDS, Amazon Elastic Load Balancing, Elastic Search, Amazon MQ,
  • Experienced in working with various Python IDE’s using PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans and Sublime Text
  • Good experience in using WAMP (Windows, Apache, MYSQL, and Python/PHP) and LAMP
  • (Linux, Apache, MySQL, and Python/PHP) Architectures.
  • Experience working with message queue services like Apache Kafka, Rabbit MQ, Active MQ.
  • Experienced with containerization and orchestration services like Docker, Kubernetes.
  • Strong experience in developing Web Services like REST, Restful with Python programming language.
  • Well versed with Agile with SCRUM, Waterfall Model and Test-driven Development (TDD) methodologies.
  • Good Experience in Linux Bash scripting and following PEP-8 Guidelines in Python.
  • Extensive experience in developing dashboards, reports using tools like Tableau, SSRS e.t.c.
  • Extensive Knowledge on developing Spark SQL.
  • Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s.
  • Excellent Interpersonal and communication skills, efficient time management and organization skills, ability to handle multiple tasks and work well in a team environment.

TECHNICAL SKILLS:

Operating Systems: Windows 98/2000/XP/7,8, Mac OS and Linux CentOS, Debian, Ubuntu

Programming Languages: Python, R, C, C++

Python Libraries/Packages: Psycopg2, ibm db, terradata, pyhive, NumPy, SciPy, Boto, Pickle, PySide, PyTables, Data Frames, Pandas, Matplotlib, SQLAlchemy, HTTPLib2, Urllib2, Beautiful Soup

Statistical Analysis Skills: A/B Testing, Time Series Analysis, Marko

IDE: PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans, Sublime Text, Visual Code

Machine Learning and Analytical Tools: Supervised Learning (Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, Classification), Unsupervised Learning (Clustering, KNN, Factor Analysis, PCA), Natural Language Processing, Tableau.

Amazon Web Services: EC2, S3, MQ, ECS, Lambdas, Sagemaker, RDS, SQS, IAM, Cloud Watch, EBS and CloudFormation

Databases/Servers: Hive, DB2, MySQL, SQLite3, PostgreSQL, MongoDB

ETL: Informatica 9.6, SSIS.

Web Services/ Protocols: HTTP/HTTPS, Rest, Restful

Miscellaneous: Gitlab, GitHub

Build and CI tools: Docker, Kubernetes, Jenkins, Screwdriver

SDLC/Testing Methodologies: Agile, Waterfall, Scrum, TDD

PROFESSIONAL WORK EXPERIENCE:

Confidential, Piscataway, New Jersey

Data Analytics Developer/Python Developer

Responsibilities:

  • Developed multiple ETL applications using Python, Spark.
  • Provided production support for python ETL application.
  • Developed ETL pipelines to perform analytics on data.
  • Responsible for managing and deployment of product WSGI server and Apache Server on Microsoft Azure Centos Server.
  • Worked on different databases like Hive, Teradata, Postgres, IBM Db2 etc.
  • Developed oozie workflows to schedule spark and hive jobs.
  • Developed CI/CD pipeline using screwdriver.
  • Building Azure Data Lakes, which perform efficiently High Volume Data by using the functionalities of Azure Data Lake Analytics.
  • Developed custom scheduler to schedule ETL applications.
  • Reduced performance issues and optimized application code.
  • Azure Analysis Services Tabular cube is built on Azure Data Lake and Azure Cosmos DB.
  • Developed analytical solutions using python.
  • Designed, developed and assesses requirements for the new application.
  • Developed analytical dashboards using Tableau.
  • Developed unit testing using pytest.
  • Worked on Azure Data Factory and Azure Databricks as part of EDS transformation.
  • For ingestion of data from various on-perm API’s to data lake or Blob Storage through Azure Data Factory
  • Developed rest API’s using python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Experienced in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using relational database like Oracle, DB2, MySQL, Sybase, PostgreSQL and MS SQL server.
  • Experience in using Docker and Ansible to fully automate the deployment and execution of the benchmark suite on a cluster of machines.
  • Good Experience in Linux Bash scripting and following PEP-8 Guidelines in Python.
  • Extensive Knowledge on developing Spark SQL jobs by developing Data Frames.
  • Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s
  • Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
  • Experienced with containerization and orchestration services like Docker, Kubernetes.
  • Implemented Agile Methodology for building an internal application.
  • Worked on cloud versioning technologies like Gitlab, Github.
  • Experience on issue tracking technologies like jira, splunk.

Confidential, San Ramon, California

Data Engineer/ Python Developer

Responsibilities:

  • Involved in Data pipelines using python for medical image pre-processing, Training and Testing.
  • Involved in creating Azure Data Factory pipelines.
  • Involved in developing Artificial Intelligence Platform which helps Data Scientist’s to Train, Test and develop A.I. models on Amazon Sagemaker.
  • Used Pandas, Opencv, Numpy, Seaborn, Tensorflow, Keras, Matplotlib, Sci-kit-learn in Python for developing data pipelines and various machine learning algorithms.
  • Worked on Azure Data Factory and Azure Databricks as part of EDS transformation.
  • Collected data needs and requirements by Interacting with the other departments.
  • Worked on different data formats such as JSON, XML.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Customer demo on Azure Data Analytics technologies.
  • Implemented Azure Key Vault service to store all the vital credential like Service Complex Problem-Solving Principal key, Database credentials, Storage connection string and others
  • Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
  • Involved in development of Web Services using REST API’s for sending and getting data from the external interface in the JSON format.
  • Presentations to senior client executives on Azure Data platform.
  • Configured EC2 instances and created S3 data pipes using Boto API to load data from internal data sources.
  • Implemented Agile Methodology for building an internal application.
  • Amazon Lambdas, Amazon SQS, AWS Identity and access management, AWS Cloud Watch, Amazon EBS and Amazon Cloudformation).
  • Proficient in SQLite, MySQL and SQL databases with Python.
  • Experience in cloud versioning technologies like Github.
  • Worked closely with Data Scientists to know data requirements for the experiments.
  • Experience in using DevOps technologies like Junkins, Docker, Kubernetes etc.
  • PaaS Services Azure SQL Database, Azure SQL Datawarehouse, Azure SQL Database Managed Services, Azure Analysis Services

Confidential

Python Developer

Responsibilities:

  • Develop a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
  • Worked in designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms including structured, semi-structured and unstructured data.
  • Developed rest API's using python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques.
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Analyzed SQL scripts and designed the solutions to implement using PySpark.
  • Used JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into Hive tables.
  • Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Used MongoDB to stored data in JSON format and developed and tested many features of dashboards using Python.

Confidential, Little rock, Arkansas

Data science Analyst

Responsibilities:

  • Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
  • Developed and executed complex SQL queries to pull data from data sources like SQL server database, and Oracle.
  • Evaluated Information Management System Database to improve Data Quality issues using DQ Analyzer and other Data preprocessing tools.
  • Developed python scripts to automate Data Analysis.
  • Implemented Data Governance policies & procedures in the Students Information Management Database.
  • Leverage BI Tools (SSRS, Tableau & Business Objects) to create analytics dashboards/scorecards that can report useful insights to decision makers.
  • Executed Data Analysis and Data Visualization on survey data using Tableau Desktop as well as Compared respondent’s demographics data with Univariate Analysis using Python (Pandas, NumPy, Seaborn, Sklearn, and Matplotlib).
  • Developed a machine learning model to recommend friends to students based on their similarities.
  • Used Alteryx for Data Preparation in such way that is it useful for developing reports and visualizations.
  • Analyzed university research budget with peer universities budgets in collaboration with the research team, and recommended data standardization and usage to ensure data integrity.
  • Reviewed basic SQL queries and edited inner, left, & right joins in Tableau Desktop by connecting live/dynamic and static datasets.
  • Conducted statistical analysis to validate data and interpretations using Python and R, as well as presented Research findings, status reports and assisted with collecting user feedback to improve the processes and tools.
  • Reported and created dashboards for Technical Services using SSRS, Oracle BI, and Excel. Deployed Excel VLOOKUP, PivotTable, and Access Query functionalities to research data issues.

Confidential

Data Engineer

Responsibilities:

  • Used update strategy to effectively migrate data from source to target.
  • Moved the mappings from development environment to test environment.
  • Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database.
  • Created various transformations according to the business logic like Source Qualifier, Normalizer, Lookup, Stored Procedure, Sequence Generator, Router, Filter, Aggregator, Joiner, Expression and Update Strategy.
  • Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter and Update Strategy.
  • Improving workflow performance by shifting filters as close as possible to the source and selecting tables with fewer rows as the master during joins.
  • Used connected and unconnected lookups whenever appropriate, along with the use of appropriate caches.
  • Created tasks and workflows in the Workflow Manager and monitored the sessions in the Workflow Monitor.
  • Perform Maintenance, including managing Space, Remove Bad Files, Remove Cache Files and monitoring services.
  • Set up Permissions for Groups and Users in all Development Environments.
  • Migration of developed objects across different environments.
  • Experienced in Agile Methodologies and SCRUM Process.
  • Maintained program libraries, user’s manuals and technical documentation.
  • Involved in entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
  • Built various graphs for business decision making using Python matplotlib library.
  • Worked in development of applications especially in UNIX environment and familiar with all its commands.
  • Handling the day to day issues and fine tuning the applications for enhanced performance.
  • Implement code in Python to retrieve and manipulate data.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Used Django framework for application development.

We'd love your feedback!