We provide IT Staff Augmentation Services!

Python Developer/ Cloud Data Engineer Resume

4.00/5 (Submit Your Rating)

Maddison, WI

SUMMARY

  • Working in IT for more than 7 years as a Data Engineer and Python Developer in Cloud platforms and areas include Data mining, Data analysis, Statistical analysis, machine learning, Deep learning with large data sets of structured and unstructured data source and Bigdata platform.
  • Ability to perform data manipulation using Python data types such as lists, tuples, sets, iterators, and generators, list comprehensions and dictionary comprehensions, and Pandas’ data frames.
  • Experience on AWS services such as EC2, S3, EMR, RDS, Glue, Lambda, Auto Scaling, Elastic Beanstalk, Cloud Formation and RedShift.
  • Experience in using analytic data warehouses like Snowflake, Data bricks and Hadoop.
  • Developed Spark code using Python/Scala and Spark - SQL for faster testing and processing of data
  • Experience in using Data bricks for handling all analytical process from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs.
  • Experienced in designing Restful APIs to perform CRUD operations using Django REST Framework
  • Experience with developing and maintaining applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Watch.
  • Proficient in Data Warehousing, Data Mining concepts and ETL transformations from Source to target systems.
  • Expertise in DevOps, Release Engineering, Configuration Management, Cloud Infrastructure, Automation. It includes Amazon Web services (AWS), Apache Maven, Jenkins, GitHub, and Linux etc.
  • Set-up databases in AWS using RDS, storage using S3 bucket and configuring instance backups to S3 bucket.
  • Good experience in Shell Scripting, SQL Server, UNIX and Linux and knowledge on version control software GitHub
  • Experienced in migrating ETL transformations using transformations and join operations.

TECHNICAL SKILLS

Languages: HTML, CSS, JSP, java, Python, Scala, Java script

Operating Systems: Windows, UNIX, Mac OS

Hadoop Ecosystem: HDFS, Scala, Map Reduce, Hive, Pig, Oozie, Snowflake, KafkaRDBMS/DBMS MS SQL Server, Oracle, DB2, PostgreSQL, NoSQL

Web Frameworks: JavaScript, jQuery, CSS3, Hibernate

AWS Components: EMR, S3, EC2, EBS, Redshifts

ETL Tools: Data Transformation Services, SQL Server Integration ServicesReporting tools Power BI, MS Excel

Tools: / Methodologies Eclipse, GitHub, Agile/Scrum, Jenkins

Databases: NoSQL, Oracle, SQL Server, MySQL, Redshift

Cloud AWS, Azure: Web Technologies HTML5, JavaScript, XML, JSON, jQuery, CSS3.

PROFESSIONAL EXPERIENCE

Confidential, Maddison, WI

Python Developer/ Cloud Data Engineer

Responsibilities:

  • Responsible for loading customer's data and event logs from Kafka into HBase using REST API.
  • Developed data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Worked on designing, building, deploying, and maintaining Mongo DB.
  • Worked on cloud deployments using Maven, Docker, and Jenkins.
  • Worked with various sources like relational databases, XML, JSON, CSV files to perform ETL operations to the target data marts/warehouses.
  • Implemented Kafka model which pulls the latest records into Hive external tables.
  • Worked extensively on AWS components like Elastic MapReduce (EMR), Elastic Compute Cloud (EC2), and Simple Storage Service (S3).
  • Used AWS Glue for the data transformation, validation, and data cleansing.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR Spark cluster.
  • Involved in creating Hive tables, loading with data, and writing Hive queries which will run internally in MapReduce way.
  • Used Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Knowledge on PySpark and used Hive to analyze sensor data and cluster users based on their behaviour in the events.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Loaded the dataset into Hive for ETL (Extract, Transfer and Load) operation.
  • Used ETL to extract, clean, transform and load data into the data warehouse/data marts.
  • Implemented the ETL from S3 to snowflake using python.
  • Used Snowflake connectors for Tableau and developed reports using snowflake views/tables.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple data formats to uncover insights into the customer usage patterns.
  • Performance tuning of Spark applications for setting right batch interval time, correct level of parallelism and memory tuning. Created on-demand tables on S3 les using Lambda functions and AWS Glue using Python and PySpark.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3)
  • Used Python Boto 3 to configure the services AWS Glue, EC2, and S3.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Developed Data frames for data transformation rules.
  • Developed workflows using Airflow to automate the tasks of loading the data into HDFS.
  • Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
  • Participated in code reviews with peers to ensure proper test coverage and consistent code standards.
  • Performed cross-team and internal s for multiple deliverables being produced by data engineering team.
  • Developed and analysed the SQL scripts and designed the solution to implement using PySpark.

Confidential, Dallas, TX

Python Data Engineer

Responsibilities:

  • Analysed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Worked on Spark using Scala and SparkSQL for faster testing and processing of data.
  • Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
  • Worked with NoSQL databases like HBase to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
  • Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple le formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement.
  • Designed and developed real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming ETL and apply Machine learning.
  • Designed data Models using SQLAlchemy ORM, connected to MySQL database on AWS RDS, the data entered from the web page can be directly stored in the database.
  • Designed front end and backend of the application utilizing Python on Django Web Framework.
  • Develop views and templates with Python and Django view controller and templating language to create a user-friendly website interface.
  • Modify the existing Python/Django modules to deliver a certain format of data.
  • Use Python libraries NumPy and Pandas to organize and count the data, adjust the data format, and then use matplotlib to generate graphical reports.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWSHDFS.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL.
  • Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
  • Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
  • Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to HIVE and IMPALA.
  • Worked on Spark using Scala and SparkSQL for faster testing and processing of data.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Confidential, San Jose, CA

Cloud Python Developer/ Data Engineer

Responsibilities:

  • Designed and developed using Django Rest Framework for Restful APIs with Server-Side Validation, View sets, Routers, and Regular Expression-based routings.
  • Utilized Docker files to create Docker images with Django with the help of the Docker Compose package.
  • Designed and Developed the UI using front-end technologies like HTML, CSS, JavaScript, Typescript, Bootstrap, and JSON.
  • Create and extend the built-in model of the user group, import Django Authentication, set different user permissions, and make management more convenient for managers.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed data pipeline using Sqoop, Hive, Pig and Java MapReduce to ingest claim and policy histories into HDFS for analysis.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Created pipelines to move data from on-premises servers to Azure Data Lake.
  • Analyzed existing database structure to determine the scope of migrations and application designs with Python frameworks.
  • Evaluate requirements to detail technical specifications for software design and architecture, to determine the scope of migration and application design using the Python framework.
  • Developed python scripts to sync data from GCP spanner to Azure and monitored jobs using Airflow.
  • Worked on processing streaming data from Kafka topics using Scala and ingest the data into Cassandra.
  • Worked on Scala programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.
  • Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and Spark Streaming.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
  • Worked on Scala programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.
  • Worked on Spark using Scala and SparkSQL for faster testing and processing of data.
  • Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
  • Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.

Confidential

Data Engineer

Responsibilities:

  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups and log files.
  • Used Oozie to orchestrate the workflow.
  • Involved in loading data from LINUX file system to HDFS.
  • Analyzed data using Hadoop components Hive and Pig.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics.
  • Moved the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Implemented test scripts to support test driven development and continuous integration.
  • Used Amazon Cloud Watch to monitor and track resources on AWS.
  • Used version control tools Git to update the project with team members.

We'd love your feedback!