We provide IT Staff Augmentation Services!

Sr. Cloud Data Engineer/python Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • 7 Plus years of experience in the IT Industry as the Data Engineer/Python Developer in Cloud platform and areas including Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, Data mining with large data sets of structured and unstructured data source and Big Data.
  • Good Experience with Django, a high - level Python Web framework. Experience object oriented programming (OOP) concepts using Python, Django, and Linux.
  • Experience in building end to end data science solutions using R, Python, SQL and Tableau by leveraging machine learning based algorithms, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
  • Hands on Amazon Web Services (AWS) for creating and managing EC2, Elastic Map Reduce, Elastic Load-balancers, Elastic Container Service (Docker Containers), S3, Lambda, Elastic File system, RDS, Cloud Watch, Cloud Trail, IAM and Kinesis Streams.
  • Experience with real-time data sources and message ingestion for processing by filtering, aggregating, and preparing the data for analysis using technologies such as Spark Streaming and Kafka, AWS Kinesis, Firehose.
  • Experience in working with Relational DB (RDBMS) like Snowflake, MYSQL, PostgreSQL, SQLite and No-SQL database MongoDB for database connectivity.
  • Experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib/Seaborn, R ggplot2/Shiny to generate charts like Box Plot, Scatter Chart, Pie Chart and Histogram etc., and to create visually impactful and actionable interactive reports and dashboards.
  • Extensive experience in implementation of the version control software like Git.
  • Experienced in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using relational database like Oracle, DB2, MySQL, Sybase, PostgreSQL and MS SQL server.
  • Experience in using Docker and Ansible to fully automate the deployment and execution of the benchmark suite on a cluster of machines.
  • Extensively worked on Jenkins for continuous integration and setup end-to-end automation for all build and deployments.
  • Experience in developing machine learning models like Classification, Regression, Clustering, and Decision Tree.
  • Experienced with full Software Development Lifecycle (SDLC), Model View Control (MVC) architecture using Django, Flask, Object Oriented Programming, Database Design and Agile Methodologies.
  • Excellent analytical and problem solving skills and ability to work on own besides being a valuable and contributing team player.

TECHNICAL SKILLS

Web Technologies: JQuery, CSS, JQuery, HTML and AngularJS

Programming: C, C++, Python, Perl, Java and Scala

Cloud: AWS and Microsoft Azure

Methodologies: OOPS, OOAD, Agile & Scrum

Automation Tools: Puppet, Chef, Ansible, Kickstart, Jumpstart and Terraform

Databases: Oracle 9i, PostgresSQL & MySQL, Oracle, SQL, Mongo dB, Snowflake (cloud)

Protocols: TCP/IP HTTP, HTTPS, FTP & SOAP

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Operating Systems: Windows 98/2000/XP/Vista, Windows 7/8/8.1/10, UNIX Version 6

IDE Eclipse: RAD 7.0, Notepad++, Visual Studio 2010, Eclipse

Version Control Tools: Tortoise SVN, CVS

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Sr. Cloud Data Engineer/Python Developer

Responsibilities:

  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Managed the imported data from different data sources, performed transformation using Hive, Pig and Map- Reduce and loaded data in HDFS.
  • Executed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability and developed Oozie workflow to run job onto data availability of transactions.
  • Used MongoDB to stored data in JSON format and developed and tested many features of dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Analyzed SQL scripts and designed the solutions to implement using PySpark.
  • Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Workings on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library
  • Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques.
  • Scheduled Jobs using crontab, run deck and control-m.
  • Developed entire frontend and backend modules using Python on Django Web Framework and created User Interface (UI) using JavaScript, bootstrap, Cassandra with MySQL and HTML5/CSS
  • Developed and managed cloud VMs withAWSEC2 command line clients and management console.
  • Worked with the version control tools, such asGit, to keep versions attributed from different people and record project at different time points.
  • Virtualized the AWS Servers using the Docker, created the Docker files and version controlled them.
  • Implemented a CI/CD pipeline with Jenkins, GitHub, Nexus, Maven and AWS AMI's.
  • Worked in designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms including structured, semi-structured and unstructured data.
  • Developed rest API's using python with flask and Django framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Worked with Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, and Resource Manager.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification
  • Implemented ETL process wrote and optimized SQL queries to perform data extraction and merging from SQL server database.

Confidential, New York

Python Developer/Data Engineer

Responsibilities:

  • Designed and built statistical models and feature extraction systems and used models to solve business problems related to company’s data pipeline and communicated these solutions to executive stakeholders.
  • Researched and implemented various Machine Learning Algorithms using the R language.
  • Devised a machine learning algorithm using Python for facial recognition.
  • Used R for a prototype on a sample data exploration to identify the best algorithmic approach and tan wrote Scala scripts using spark machine learning module.
  • Developed pre-processing pipelines for DICOM and NONDICOM Images.
  • Developed and presented analytical insights on medical data, image data.
  • Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
  • Create several types of data visualizations using Python and Tableau.
  • Collected data needs and requirements by Interacting with the other departments.
  • Worked on different data formats such as JSON, XML.
  • Used Scala scripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear regressions algorithms.
  • Worked on JSON based REST Web services and Amazon Web services (AWS) and Responsible for setting up Python REST API framework and spring framework using DJANGO. Worked on deployment, data security and troubleshooting of the applications using AWS services
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
  • Implemented Agile Methodology for building an internal application.
  • Conducted statistical analysis on Healthcare data using python and various tools.
  • Experience in cloud versioning technologies like Github.
  • Worked closely with Data Scientists to know data requirements for the experiments.
  • Deep experience in using DevOps technologies like Jenkins, Docker, Kubernetes etc.
  • Implement AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Worked on Migrating an On-premises virtual machine to Amazon Web Services (AWS) cloud.
  • Developed Merge jobs in Python to extract and load data into a MySQL database.
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Design and engineer REST APIs and/or packages that abstract feature extraction and complex prediction/forecasting algorithms on time series data.
  • Developed Python application for Google Analytics aggregation and reporting and used Django configuration to manage URLs and application parameters.

Confidential, Charlotte, NC

Python Developer/Cloud Data Engineer

Responsibilities:

  • Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter and Update Strategy.
  • Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines.
  • Created tasks and workflows in the Workflow Manager and monitored the sessions in the Workflow Monitor.
  • Perform Maintenance, including managing Space, Remove Bad Files, Remove Cache Files and monitoring services.
  • Designed and developed Web services using XML and jQuery.
  • Improved performance by using more modularized approach and using more in-built methods.
  • Experienced in Agile Methodologies and SCRUM Process.
  • Worked independently and collaboratively throughout the complete analytics project lifecycle in- clouding data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Used Django framework for application development.
  • Designed and developed the UI of the website using HTML, AJAX, CSS and JavaScript
  • Moved the mappings from development environment to test environment.
  • Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database.
  • Interacted with the business community and database administrators to identify the Business requirements and data realties.
  • Involved in entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
  • Built various graphs for business decision making using Python matplotlib library.
  • Used NumPy for Numerical analysis for Insurance premium.
  • Handling the day to day issues and fine tuning the applications for enhanced performance.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Atana.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and tan into HDFS location.
  • Created various transformations according to the business logic like Source Qualifier, Normalizer, Lookup, Stored Procedure, Sequence Generator, Router, Filter, Aggregator, Joiner, Expression and Update Strategy.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Analyzing and visualizing the data by using Matplotlib and seaborn libraries in Python.
  • Processed the Raw data from CSV files in to organized form by applying DataCleaning techniques using Pandas and NumPy.
  • Maintained program libraries, user’s manuals and technical documentation.
  • Wrote unit test cases for testing tools.

Confidential, Chicago, IL

Data Analyst/Data Engineer

Responsibilities:

  • Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis (EDA), model development and model evaluation.
  • Worked for the Analytics team, to update the regular reports and providing solutions.
  • Creating visualizations for the data extracted with the help of Tableau.
  • Analyzed metadata and processed data to get better insights of the data.
  • Analyzed pre-existing predictive model developed by advanced analytics team and factors considered during model development.
  • Experienced in various Python libraries like Pandas, One dimensional NumPy and Two dimensional NumPy.
  • Identifying patterns and meaningful insights from data by analyzing it.
  • Created initial data visualizations in tableau to provide basic insights of data to the project stakeholders.
  • Performed extensive exploratory data analysis using Teradata to improve the quality of the dataset and developed Machine Learning algorithms using Python for predicting the model quality and created Data Visualizations using Tableau.
  • Experienced in using PyTorch library and implementing natural language processing.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.

We'd love your feedback!