We provide IT Staff Augmentation Services!

Senior Cloud Data Engineer Resume

5.00/5 (Submit Your Rating)

Carlsbad, CaliforniA

SUMMARY

  • Having 9 years of experience as a Data Engineerand coding with analytical programming using Python, DJango, Java.
  • Experience in all phases of Software Development Life Cycle (SDLC) - Waterfall, Agile Process across various workflows (Requirement study, Analysis, Design, Coding, Testing, Deployment and Maintenance) in Web & Client/Server application development.
  • Experience in design, development, and Implementation of Big data applications using Hadoop ecosystem frameworks and tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm HBase, Kafka, Flume, Nifi, Impala, Oozie, Zookeeper, Airflow, etc.
  • Expertise in developing using Python, Scala and Java
  • Experience in working with several python libraries including Beautiful soup, NumPy, matplotlib, SciPy, PyQt.
  • Good experience of software development in Python (libraries used: libraries- PySpark, PostgreSQL for database connectivity)
  • Hands-on experience with industry-standard IDEs like PyCharm, Jupyter Notebook.
  • Experience with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including EC2, EBS, S3, VPC, RDS, SES, ELB, EMR, CloudFront, Cloud Formation, Elastic Cache, Cloud Watch, Cloud Trail, RedShift, Lambda, SNS, DynamoDB, AWS Import/Export.
  • Experience in AWS cloud infrastructure database migrations, PostgreSQL and converting existing ORACLE and MS SQL Server databases to PostgreSQL, MySQL and Aurora.
  • Experience in Building S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
  • Solid Experience and understanding of Implementing large scale Data Warehousing Programs and E2E Data Integration Solutions on Snowflake Cloud, AWS Redshift.
  • Played a key role in Migrating Teradata objects into the Snowflake environment.
  • Expertise in full life cycle application development and good experience in Unit testing and Test-
  • Driven Development (TDD) and Behavior driven Development.
  • Proficient in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using relational databases like PostgreSQL.
  • Good experience in Shell Scripting, SQL Server, UNIX and Linux and visualization tools such as PowerBI and Tableau.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Install and configuration of Web hosting administration HTTP, FTP, SSH, RSH.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Migration of users from non-secure FTP protocol-based servers to more secure SFTP, FTPS, HTTPS based servers.
  • Experience with Snowflake Virtual Warehouses.
  • Strong understanding and hands on experience setting up ElasticSearch in AWS cloud environment.
  • Experience in using XML, SOAP, REST web Services for interoperable software applications.
  • Experience in Agile development processes ensuring rapid and high-quality software delivery.
  • Well versed with Agile, SCRUM and Test-driven development methodologies.
  • Experience in handling errors/exceptions and debugging issues in large scale applications.
  • Highly motivated, dedicated, quick learner and has proven ability to work individually and as a team.
  • Excellent written and oral communication skills with results-oriented attitude.

TECHNICAL SKILLS

Programming Languages: Python, JAVA, C#, C++, SQL, COBOL, JCL

Query Languages: SQL, PL/SQL

Operating Systems: Windows Vista/XP/7/8/10, Linux, Unix, OS X

Deployment Tools: AWS (EC2, S3, ELB, RDS, Glue), Heroku, Jenkins, Azure

Web Development: CSS, HTML, DHTML, XML, JavaScript. Angular JS, JQuery and AJAX

Web Servers: WebSphere, WebLogic, Apache, Gunicorn

Python Framework: Django, Flask, Web2py and Bottle, Pyramid, Swagger, RabbitMQ

Bug Tracking Tools: Jira, Bugzilla, Junit, gdb

Databases: Oracle 11g/10g/9i, Cassandra 2.0, MySQL, SQL Server RC 2008, Data Warehousing

Cloud Computing: Amazon EC2/S3, Heroku, Google App Engine

Methodologies: Agile, Scrum and Waterfall

IDEs: Sublime Text, PyCharm, Eclipse, NetBeans, jDeveloper,WebLogic Workshop, RAD

PROFESSIONAL EXPERIENCE

Confidential

Senior Cloud Data Engineer

Responsibilities:

  • Build and create data pipelines undertaking a AWS Implementation and Migration for all of their data domains for clinical claims and member data design from the ground up.
  • Develop, unit test and maintain applications that process structured, unstructured, semi-structured data into consumable formats for reporting and analytics.
  • Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
  • Developing data ingestion and integration solutions using AWS services like DataSync, S3, Glue, Lambda, Lake Formation, Glue DataBrew, RDS, Redshift, Atana.
  • Participate in the end-to-end lifecycle of ETL implementations including business analysis, design, development, deployment and continuous improvement.
  • Communicating with end-users, technical teams, and senior management to collect requirements, describe data modeling decisions and data engineering strategy.
  • Assess the TEMPeffectiveness and accuracy of new data sources and data gathering techniques.
  • Performance tune Spark jobs to analyze model performance and data accuracy.
  • Involved in CI/CD deploy pipelines automating manual processes, optimizing data delivery.
  • Coordinate with different functional teams to implement models and monitor outcomes.
  • Working with CloudFormation templates also scheduling and monitoring the execution plans.
  • Implemented ETL with application design principles and AWS architecture.
  • Working and collaborate with teams of architects and developers to complete tasks from various sized projects.
  • Contribute to the development of the organization by sharing noledge with peers both within the team and across the organization and providing mentorship to halp develop data engineering expertise within our teams.

Environment: AWS, Spark, Glue, Python, DataSync, Lambda, Lake Formation, Docker, PgSQL, Nifi, Jupyter, DataBrew, RDS, Redshift

Confidential, Carlsbad, California

Senior Data Engineer

Responsibilities:

  • Developed spark programs and python functions that involve performing transformations and actions on data sets.
  • Implement data pipeline solutions with a focus on collecting, managing, modeling, analyzing, and visualizing data.
  • Involved in Data Pipeline expertise (agronomy, remote sensing, retail transaction).
  • Generate and leverage system metrics to improve scalability and efficiency abilities.
  • Implemented ETL using AWS Cloud Resources like Glue, S3, EC2, RDS.
  • Contribute across the software stack, including application development, infrastructure, continuous deployment, and improving our developer ecosystem.
  • Develop a data pipeline that incorporates multiple sources of structured and unstructured data and prepares it for modeling and reporting.
  • Build infrastructure that achieves reliability and accuracy and integrates seamlessly with data science tools and user-facing applications.
  • Worked on data processing technologies such as Spark, Airflow, Kinesis, and Kafka.
  • Working with data scientists to in corporate their work into applications and reporting.
  • Contribute to the development of the organization by sharing noledge with peers both within the team and across the organization and providing mentorship to halp develop data engineering expertise within our teams.
  • Understanding of typical data engineering architectural patterns such as ETL/ELT, streaming vs. batch processing, and actor model systems.
  • Experience working on projects that span multiple organizations and business units.
  • Work on a cross-functional agile team that is responsive to the needs of the customer and contributes to the overall technical strategy.
  • Agile software development experience including testing, code reviews, and CI/CD.

Environment: Spark, AWS, Glue, RDS, EC2, Docker, Kubernetes, Python, PgSQL, Nifi, Jupyter, Kinesis, Airflow

Confidential

Senior Data Engineer

Responsibilities:

  • Work closely with Business Analysts and Product Owner to understand the requirements.
  • Developed applications using spark to implement various aggregation and transformation functions of Spark RDD and Spark SQL.
  • Used Joins in SPARK for making smaller datasets to large datasets without shuffling of data across nodes.
  • Developed Spark Streaming jobs using Python to read messages from Kafka.
  • Downloaded JSON files from AWS S3 buckets.
  • Implemented ETL using AWS RedShift/Glue.
  • Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra.
  • Prototyped analysis and joining of customer data using Spark in Scala and processed it to HDFS
  • Implemented Spark in EMR/Glue for processing Big Dataacross our OneLake in AWS System
  • Consumed and processed data from DB2.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Used Scala components to implement the credit line policy based on the conditions applied on spark data frames.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Used pandas UDF like building the array contains, distinct, flatten, map, sort, split and overlaps for filtering the data
  • Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.
  • Built NiFi flows for data ingestion purposes. Ingested data from Kafka, Microservices, CSV files from edge nodes using NIFI flows.
  • Implementing and orchestrating data pipelines using Oozie and Airflow.
  • Building automate pipelines using Jenkins and groovy scripts
  • Using shell commands to push the environment and test files AWS using Jenkins automated pipelines

Environment: Spark, Scala, AWS, EMR, Glue, RedShift, EC2, Docker, Python, PgSQL, Nifi, Airflow Jupyter, Kafka

Confidential, Chicago, Illinois

Data Engineer

Responsibilities:

  • Work on requirements gathering, analysis and designing of the systems.
  • Developed Spark programs using Scala to compare the performance of Spark with Hive and SparkSQL.
  • Developed spark streaming application to consume JSON messages from Kafka and perform transformations.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Implemented Spark using Scala and SparkSql for faster testing and processing of data.
  • Involved in developing a MapReduce framework that filters bad and unnecessary records.
  • Ingested data from RDBMS and performed data transformations, and tan export the transformed data to Cassandra as per the business requirement.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs with Scala.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Development of BI data lake POCs using AWS Services including Atana, S3, Ec2, Glue and Quicksight.
  • Migrated the computational code in hql toPySpark.
  • Worked with Spark Ecosystem using Scala and Hive Queries on different data formats like Text file and parquet.
  • Worked in migrating Hive QL into Impala to minimize query response time.
  • Advanced skills in Excel as well as any data visualization tools like Quicksight, Tableau or similar BI tools.
  • Responsible for migrating the code base to Amazon EMR and Amazon eco systems components like Redshift.
  • Create and design several core data type mappings for customer data records using ElasticSearch.
  • Provide key production recommendations concerning overall cluster sizings and relevance queries for ElasticSearch.
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Developed Python scripts to clean the raw data.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Used AWS services like EC2 and S3 for small data sets processing and storage
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context in spark Jobs.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Worked in Agile environment using Scrum methodology.

Environment: Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, PySpark, Cassandra, Oozie, Nifi, Solr, Shell Scripting, Hbase, Scala, AWS, Maven, Java, JUnit, agile methodologies, Horton works, Soap, Python, Teradata, MySQL.

Confidential, Bloomington, Minnesota

Full Stack Python Developer with DevOps

Responsibilities:

  • Involved in architecture design, development, and implementation of Hadoop deployment, backup, and recovery systems.
  • Developed MapReduce programs in Python using Hadoop to parse the raw data, populate staging tables, and store their fine data in partitioned HIVE tables.
  • Enabled speedy reviews and first-mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Converted applications that were on MapReduce to PySpark which performed the business logic.
  • Involved in creating Hive tables, loading with data, writing hive queries that will run internally in map reduce way.
  • Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
  • Imported Teradata datasets onto the HIVE platform using Teradata JDBC connectors.
  • Was involved in writing Fast Load and Multi Load scripts to load the tables.
  • Worked with different types of Indexes and Collect Statistics in Teradata and improved execution strategy.
  • Worked with the SQL assistant and BTEQ to ingest and execute queries, stored procedures, and update the tables.
  • Developed dashboard prototypes using Cloud Dashboard Tools Looker and AWS Quicksight managing all aspects of the technical development.
  • Experience in bigdata using AWS Redshift with technologies like AWS Atana, Quicksight.
  • Recommend changes to underlying core data type mappings in elastic search.
  • Worked in extracting XML type files using XPath and storing it into Hive tables.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Involved in designing the tables in Teradata while importing the data.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Experienced in managing and reviewing the Hadoop log files.
  • Main duties are resolving the incidents, performing code migration from lower environment to production, in case of code related issues.
  • Responsible for code deployment into the production environment.
  • Developed Hive jobs to parse the logs, structure them in tabular format to facilitate TEMPeffective querying on the log data.
  • Analyze production issues to determine root cause and provide fixed recommendations to the Support team. Created, developed, and tracked solutions to application errors reported.
  • Note interruptions or bugs in operation and carry out mitigation / problem management
  • Assist with troubleshooting and issue resolution relating to current applications, providing assistance to the development.
  • Coordinate with Support teams during application deployments.
  • Working on system issues on production clusters like file system issues, connection issues, system slow and monitoring the HDFS file system of all digital analytics.
  • Extensively used UNIX for shell Scripting and pulling the Logs from the Server.
  • Used Solr/Lucene for indexing and querying the JSON formatted data.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Worked with the Avro Data Serialization system to work with JSON data formats.
  • Used Solr/Lucene for indexing and querying the JSON formatted data.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Completed testing of integration and tracked and solved defects.
  • Worked on AWSservices like EC2 and S3 for small data sets.
  • Involved in loading data from the UNIX file system to HDFS.
  • Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map reduce jobs that extract and Zookeeper for providing coordinating services to the cluster.

Environment: Hadoop Hortonworks2.2, Hive, Pig, HBase, Sqoop, and Flume, Oozie, AWS, S3, EC2, EMR Spring, Kafka, SQL Assistant, Python, UNIX, Teradata

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Involved in building database Model, APIs and Views utilizing Python, to build an interactive web-based solution.
  • Experience in handling, configuration, and administration of databases likeMySQLand NoSQL databases likeMongoDBand Cassandra.
  • Create a PySpark frame to bring data from DB2 to Amazon S3.
  • Experience in MongoDBinstallation, patching, troubleshooting, performance, tracking/tuning, back - up and recoveryin dynamic environments.
  • Utilized PyQt to provide GUI for the user to create, modify and view reports based on client data. Created PyUnit test scripts and used for unit testing.
  • Developing ETL pipelines in and out of data warehouses using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Knowledgeable in AWS Services including EC2, S3, RDS, Redshift, Glue, Atana, IAM, QuickSight.
  • To facilitate common access, to Elastic Search.
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Experience in developing the applications using Python 3.6/3.7, Flask web framework backed up by MS SQL/PostgreSQL databases using SQLAlchemy for Object Relational Mapper (ORM).
  • Designed and managed API system deployment using fast http server and Amazon AWS architecture.
  • Hands-on experience in using message brokers such as ActiveMQ and RabbitMQ.
  • Spun up HDInsight clusters and used Hadoop ecosystem tools like Kafka, Oozie, Spark and databricks for real-time analytics streaming, sqoop, pig, hive and CosmosDB for batch jobs.
  • Scheduled workflow using Oozie workflow Engine.
  • Guided and migrated Postgresql and MySql databases to AWS Aurora.
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Knowledgeable in AWS Services including EC2, S3, RDS, Redshift, Glue, Atana, IAM, QuickSight.
  • Hands on experience in creating Docker containers and Docker consoles for managing the application life cycle.
  • In-depth noledge of SnowflakeDatabase, Schema and Tablestructures.
  • Very good hands-on experience working with large datasets and Deep Learning algorithms usingApache spark and TensorFlow.

Environment: Python 2.7/3.3, DJango, PyQt, AWS, Kafka, PySpark, Oozie, Azure Databricks, RabbitMQ, Redshift, Apache Web Server, Python SDK, MySQL, Tableau, Linux, Docker, React JS

We'd love your feedback!