We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Hartford, CA

SUMMARY

  • 7+ years of IT experience in Software Development as a Big Data /Hadoop Developer with good knowledge and experience in Hadoop framework.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
  • Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement, and support (SDLC & Agile techniques).
  • Experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
  • Experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark, Scala, and Python.
  • Experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
  • Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
  • Working experience in big data on cloud using AWS EC2 and Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data.
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
  • Experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
  • Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
  • Created multiple report dashboards, visualizations and heat maps using tableau, QlikView and qliksense reporting tools.
  • Strong experience in extracting and loading data using complex business logic's using Hive from different data sources and built the ETL pipelines to process tera bytes of data daily.
  • Experienced in transporting and processing real time event streaming using Kafka and Spark Streaming.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Experienced in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
  • Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
  • Designed and developed spark pipelines to ingest real time event - based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
  • Designed data models for both OLAP and OLTP applications using Erwin and used both star and snowflake schemas in the implementations.
  • Capable of organizing, coordinating, and managing multiple tasks simultaneously.

TECHNICAL SKILLS

Programming: Python, R, SQL, HTML, CSS

Databases: SQL Server, MySQL, NoSQL, Hive, Hadoop, Redshift

Python: NLTK, spaCy, matplotlib, NumPy, Pandas, Scikit-Learn

Tools: Git, Docker, Flask, DVC, Keras, Tensorflow, PyTorch

Cloud: AWS (S3, EC2, Redshift, Lambda, EMR), Azure (Synapse Analytics, Azure SQL, ADF), GCP

Visualization: Tableau, Power BI, Sisense, Excel

Core Competencies: Supervised and Unsupervised ML, SVM, DNN, Text Analytics, MX Net, Big Data, NLP

PROFESSIONAL EXPERIENCE

Confidential | Hartford, CA

Data Engineer

Responsibilities:

  • Extensively worked in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, and Glacier. Created Volumes and configured Snapshots for EC2 instances.
  • Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
  • Installed and configured Hive and written Hive UDFs and Used MapReduce for unit testing.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Used SDLC Methodology of Data Warehouse development using Kanbanize.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Created table structure for data marts in netezza
  • Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
  • Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Created Use Case Diagrams using UML to define the functional requirements of the application.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as Big Query, Data Flow, Data proc, Google Cloud Storage, Composer, and Looker etc.
  • Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
  • Designed the HBase schemes based on the requirements and HBase data migration and validation.
  • Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using services.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization, and user report generation.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases into EDW.
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
  • Worked with NoSQL database HBase in getting real time data analytics.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Generated various presentable reports and documentation using report designer and pinned reports in Erwin.

Environment: Hadoop, Agile, Hive, Netezza, PL/SQL, HBase, GCP, AWS, NoSQL, Oozie 5, MongoDB, PL/SQL, SSRS, SSIS, OLTP, OLAP, Puppet

Confidential | Jacksonville, FL

Data Engineer

Responsibilities:

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, and testing.
  • Worked on Setting up and built AWS infrastructure with various services available by writing cloud formation templates (CFT) in Json and Yaml.
  • Developed Cloud Formation scripts to build EC2 on demand.
  • With the help of IAM created roles, users and groups and attached policies to provide minimum access to the resources.
  • Updating the bucket policy with IAM role to restrict the access to user and configured AWS Identity Access Management (IAM) Group and users for improved login authentication.
  • Created topics in SNS to send notifications to subscribers as per the requirement.
  • Moving data from Oracle to HDFS using Sqoop.
  • Created Hive Tables, loaded transactional data from Oracle using Sqoop and worked with highly unstructured and semi structured data.
  • Developed MapReduce (YARN) jobs for cleaning, accessing, and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation and configuration of multiple nodes on AWS EC2 system.
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Working on CDC tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
  • Designed and developed automation test scripts using Python.
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
  • Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: AWS, Hadoop, Hive, Yarn, HBase, SSRS, SSIS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel, Python, Naive Bayes, SVM, K- means, ANN, Regression, MS Access, SQL Server Management Studio.

Confidential | San Diego, CA

Data Engineer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Designed of Cloud architectures for customers looking to migrate or develop new PaaS, IaaS, or hybrid solutions utilizing Amazon Web Services (AWS).
  • Designed, build, configured, test, install software, manage and support all aspects and components of the application development environments in AWS.
  • Utilized AWS Cloud formation to create new AWS environments following best practices in VPC / Subnet design.
  • Analyzed the business, technical, functional, performance and infrastructure requirements needed to access and process large amounts of data.
  • Coordinated with the Dev, DBA, QA, and IT Operations environments to ensure there are no resource conflicts.
  • Worked within and across agile teams to design, develop, test, implement, and support technical solutions across a full stack of development tools and technologies, and tracking all stories on JIRA.
  • Responsible for the development and maintenance of processes and associated scripts/tools for automated build, testing, and deployment of the products to various developments.
  • Manage Production Server infrastructure environment, collaborated with development team to troubleshoot and resolve issues, deliver product release with frequent deployment with zero downtime deployment.
  • Extensively involved in infrastructure as code, execution plans, resource graph and change automation using Terraform. Managed AWS infrastructure as code using Terraform.
  • Created Terraform scripts for EC2 instances, Elastic Load balancers and S3 buckets.
  • Managed different infrastructure resources, like physical machines, VMs and even Docker containers using Terraform It supports different Cloud service providers like AWS.
  • Built Jenkins jobs to create AWS infrastructure from GitHub repos containing Terraform code.
  • Configure ELK stack in conjunction with AWS and using Log Stash to output data to AWS S3.
  • Involved in AWS EC2 based automation through Terraform, Ansible, Python, and Bash Scripts. Adopted new features as they were released by Amazon, including ELB & EBS.
  • Experience in Virtualization technologies and worked with containerizing applications.
  • Automated deployment of application using deployment tool (Ansible).

Environment: AWS, PaaS, IaaS, JSON, EC2, Python, Pandas, Regression, Classification, CNN, RNN, Random Forest, TensorFlow, Keras, Seaborn, NumPy, SVM, Preprocessing, SQL, AWS Sage maker, AWS S3.

Confidential

Data Analyst

Responsibilities:

  • Exported the analyzed data to the Relational databases using Sqoop for performing visualization and generating reports for the Business Intelligence team.
  • Collaborated with the business to define requirements and recommend optimized solutions. Ability to quickly understand complex business processes and associated data sets.
  • Consulted with internal and external stakeholders to identify specific needs within customerapplication modules and document requirements for data, reports, analysis, metadata, training, service levels, data quality, and performance and troubleshooting.
  • Experience extracting data from MySQL into HDFS using Sqoop and developed Simple to complex Map Reduce jobs.
  • Responsible for writing complicated SQL queries with a good understanding of transactional databases.
  • Assisted reporting teams in developing Tableau visualizations and dashboards using Tableau Desktop.
  • Analyzed the data by performing Hive queries and running Pig Scripts to know user behavior and creating partitioned tables in Hive as part of my job.
  • Administered and supported distribution of Horton works.
  • Wrote Korn shell, Bash shell, Pearl scripts to automate most Database maintenance tasks.
  • Worked on installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Monitoring the running Map-Reduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Through PHP, I created documents and execute software designs that may involve complicated workflows or multiple product areas
  • An alternate UNIX/Oracle-based system required bug fixes, change requests, and tuning. My position was responsible for all requests of this system. Implementation, testing, and documentation were performed for this system.
  • Consult with project managers, business analysts, and development teams on application development and business plans
  • Installed and configured Hive and Created Hive UDFs.
  • Involved in creating Hive Tables, loading with data, and Writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Implemented the workflows using the Apache Oozie framework to automate tasks.
  • Developed scripts and automated data management from end to end and sync up between the clusters.
  • Designed, developed, tested, and deployed Power BI scripts and performed detailed analytics.
  • Performed DAX queries and functions in Power BI.

Environment: s: Apache Hadoop, Java, Bash, ETL, Map Reduce, Hive, Pig, Horton works, Deployment tools, Data tax, Flat files, Oracle 11g/10g, MySQL, Window NT, UNIX, Sqoop, Oozie, Tableau.

We'd love your feedback!