We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Bloomfield, CT

SUMMARY

  • Around 8 years of professional experience in all phases of SDLC including requirements analysis, applications design, development, Integration, maintenance, Installation, Implementation and testing of various client server and web applications on Big Data Eco - System.
  • Extensively worked in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
  • Expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Warehouse, Data Visualization, Reporting, and Data Quality solutions.
  • Hands on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works and Flume.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aethna, Zeppelin and Airflow.
  • Experience in handling, configuration and administration of databases like MySQL and NoSQL databases like MongoDB and Cassandra.
  • Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as BigQuery, Data Flow, Data proc, Google Cloud Storage, Composer, Looker etc.
  • Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
  • Experience in setting up and buildingAWSinfrastructure resources like VPC, EC2, S3, IAM, EBS, Lambda, Security Groups, IaaS, RDS, Dynamo DB, CloudFront, Elasticsearch, SNS and CloudFormation.
  • Data modeling and database and development for OLTP, OLAP (Star Schema, Snowflake Schema, Data Warehouse, Data Marts, Multi-Dimensional Modeling and Cube design), Business Intelligence and data mining.
  • Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
  • Working experience in creating real time data streaming solutions using Apache Spark/Spark Streaming and Kafka and built Spark Data Frames using Python.
  • Used Amazon Lambda for developing API to manage servers and run the code in AWS.
  • Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.
  • Experience with complex Data processing pipelines, including ETL and Data ingestion dealing with unstructured and semi-structured Data.

TECHNICAL SKILLS

Cloud Technologies: Amazon Web Services (IAM, S3, EC2, VPC, ELB, Route53, RDS, Auto Scaling, Cloud Front), CHEF, CONSUL, Docker, and Rack Space, GCP.

Devops Tools: Urban Code Deploy, GIT,Jenkins (CI), Puppet, Chef, Ansible, AWS.

Languages: Python, SQL, Shell and Python scripting.

Databases: MySQL, Mongo DB, Cassandra, SQL Server.

Web/App Server: Apache, IIS, HIS, Tomcat, WebSphere Application Server, JBoss.

CI Tools: Hudson, Jenkins, Bamboo, Cruise Control.

Devops or other: Jenkins, Perforce, Docker, deploy AWS, Chef, puppet, Ant, Atlassian-Jira, Ansible, Open Stack and Salt Stack, Splunk.

PROFESSIONAL EXPERIENCE

Confidential, Bloomfield,CT

Data Engineer

Responsibilities:

  • Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application and design a reliable and scalable data pipelines.
  • Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
  • Built a serverless ETL in AWS lambda to process the files that are new in the S3 bucket to be catalogued immediately.
  • Worked on AWS SQS to consume the data from S3 buckets.
  • Worked with relational SQL and NoSQL databases, including Postgresql and Hadoop.
  • Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Deployed applications on AWS by using Elastic Beanstalk. Integrated delivery (CI and CD) using Jenkins and puppet.
  • Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift
  • Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
  • Implemented the Dockerfor wrapping up the final code and setting up development and testing environment using DockerHub, Docker Swarm and DockerContainer Network.
  • Elastic search experience and capacity planning and cluster maintenance. Continuously looks for ways to improve and sets a very high bar in terms of quality.
  • Implemented real time log analytics pipeline using Elastic search.
  • Setup and configured Elastic search in a POC test environment to ingest over million records from oracle DB.
  • Designed the data models to be used in data intensiveAWS Lambdaapplications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements fromAurora.
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
  • Used Teradata Studio and Teradata SQL Assistant to run SQL Queries .

Environment: AWS, Hadoop, Python, My SQL, Jenkins, API, Teradata, GitHub, Oracle Database 12c/11g, DataStage, SQL Server 2017/2016/ 2012/ 2008.

Confidential

Data Engineer

Responsibilities:

  • Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application and design a reliable and scalable data pipelines.
  • Worked with various complex queries, sub queries and joins to check the validity of loaded and imported data.
  • Worked with PowerShell and Unix scripts for file transfer, emailing and other file related tasks.
  • Designed and implemented ETL pipelines between from various Relational data Bases to the Data Warehouse using Apache Airflow.
  • Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Worked on data transformation and retrieval from mainframes to oracle, using SQL loader and control files.
  • Created Tableau Visualizations by connecting to AWS Hadoop Elastic MapReduce.
  • Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.
  • Developed PySpark and SparkSQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed.
  • Developed data integration strategies for data flow between disparate source systems and Big Data enabled Enterprise Data Lake.
  • Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
  • Built a serverless ETL in AWS lambda to process the files that are new in the S3 bucket to be catalogued immediately.
  • Worked on AWS SQS to consume the data from S3 buckets.
  • Worked with relational SQL and NoSQL databases, including Postgresql and Hadoop.
  • Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.
  • Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python.
  • Developed and deployed to production multiple projects in the CI/CD pipeline for real-time data distribution, storage and analytics. Persistence to S3, HDFS, Postgres.
  • Configured Cloud Watch, Lambda, SQS, and SNS to send alert notifications.
  • Experience in designing a Terraform and deploying it in cloud deployment manager to spin up resources like cloud virtual networks, Compute Engines in public and private subnets along with
  • AutoScaIer in Google Cloud Platform.
  • Experience in Designing, Architecting and implementing scalable cloud-based web applications using AWS and GCP.
  • Set up a GCP Firewall rules in order to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from
  • GCP cache locations drastically improving user experience and latency.
  • Created architecture stack blueprint for data access with NoSQL Database Cassandra.
  • Deployed the Big Data Hadoop application using Talendon cloud AWS (Amazon Web Sevices).
  • Experience in providing highly available and fault tolerant applications utilizing orchestration technologies like Kubernetes and Apache Mesos on Google Cloud Platform.
  • Experience in Blue/green deployment strategy by creating new applications which are identical existing production environment using CloudFormation templates & Route53 weighted recorc redirect traffic from the old environment to the pristine environment via DNS.

Environment: AWS, Hadoop, Hive, HBase, Spark, Oozie, Kafka, My SQL,GCP, Jenkins, API, Snowflake, PowerShell, GitHub, Oracle Database 12c/11g, DataStage, SQL Server 2017/2016/ 2012/ 2008.

Confidential | Jacksonville, FL

Data Engineer

Responsibilities:

  • Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application.
  • Involved in designing and deploying a large application utilizing almost the entire AWS stack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Working on migration project of moving current applications in traditional datacenter to AWS by using AWS services.
  • Launching AmazonEC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets. Assisted the team experienced in deploying AWS andCloud Platform
  • Managed IAM policies, providing access to different AWS resources, design and refine the workflows used to grant access.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Designed AWS Cloud Formation templates to create custom sized VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Launched Compute (EC2) and DB (Aurora, Cassandra) instances from Amazon Management Console and CLI.
  • Installed and configured Splunk Universal Forwarders on both UNIX (Linux, Solaris, and AIX) and Windows Servers.
  • Hands on experience in customizing Splunk dashboards, visualizations, configurations using customized Splunk queries.
  • Experience in building multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
  • Implemented the Dockerfor wrapping up the final code and setting up development and testing environment using DockerHub, Docker Swarm and DockerContainer Network.
  • Elastic search experience and capacity planning and cluster maintenance. Continuously looks for ways to improve and sets a very high bar in terms of quality.
  • Implemented real time log analytics pipeline using Elastic search.
  • Setup and configured Elastic search in a POC test environment to ingest over million records from oracle DB.
  • Designed the data models to be used in data intensiveAWS Lambdaapplications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements fromAurora.
  • Deployed applications on AWS by using Elastic Beanstalk. Integrated delivery (CI and CD) using Jenkins and puppet.
  • Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager.
  • Setup GCP Firewall rules to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.
  • Worked on GKE Topology Diagram including masters, slave, RBAC, helm, kubectl, ingress controllers
  • GKE Diagram including masters, slave, RBAC, helm, kubectl, ingress controllers
  • Created projects, VPC's, Subnetwork's, GKE Clusters for environments QA3, QA9 and prod using Terraform Created projects, VPC's, Subnetwork's, GKE Clusters for environments.
  • Worked on Jenkins file with multiple stages like checkout a branch, building the application, testing, pushing the image into GCR, Deploying to QA3, Deploying to QA9, Acceptance testing and finally
  • Deploying to Prod
  • Configuring and deploying Open Stack Enterprise master hosts and Open Stack node hosts.
  • Experienced in deployment of applications on Apache Web server, Nix and Application Servers like Tomcat, JBoss.
  • Extensively used Splunk Search Processing Language (SPL) queries, Reports, Alerts and Dashboards.
  • Installation and implementation of the Splunk App for Enterprise Security and documented best practices for the installation and performed knowledge transfer on the process.
  • Using DB connect for real-time data integration between Splunk Enterprise and databases.
  • Virtualized the servers using the Docker for the test environments and dev-environments needs and configuration automation using Docker containers.

Environment: Amazon Web Services, IAM, S3, RDS, EC2, VPC, GCP,cloud watch, GCP, Bit Bucket, Chef, Puppet, Ansible, Docker, Apache HTTPD, Apache Tomcat, JBoss, Junit, Cucumber, Python.

Confidential | San Diego, CA.

Data Engineer

Responsibilities:

  • Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR and MapR (MapR data platform).
  • Developed Simple to complex Map/reduce streaming jobs using Python, Hive and Pig.
  • Used various compression mechanisms to optimize Map/Reduce Jobs to use HDFS efficiently.
  • Used ETL component Sqoop to extract the data from MySQL and load data into HDFS.
  • Performed ETL processes from the business data and created a spark pipeline that can efficiently perform ETL process.
  • Wrote Hive queries and Pig scripts to study customer behavior by analyzing the data.
  • Loaded data into Hive tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
  • Wrote Python scripts to process semi-structured data in formats like JSON.
  • Involved in loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Troubleshooting and finding the bugs in the Hadoop applications and to clear off all the bugs took help from the testing team.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
  • Used Python API by developing Kafka producer, consumer for writing Avro Schemes.
  • Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
  • Developed the Pysprk code for AWS Glue jobs and for EMR.
  • Installed Ganglia Monitoring Tool to generate reports related to Hadoop cluster like CPUs running, Hosts Up and Down etc., operations were performed to maintain Hadoop cluster.
  • Responsible for analyzing and data cleaning using Spark SQL Queries.
  • Handled importing of data from various data sources performed transformations using spark and loaded data into hive.
  • Worked with spark core, Spark Streaming and Spark SQL modules of Spark.
  • Used Scala to write the code for all the use cases in Spark and extensive experience with Scala for data analytics on Spark cluster and Performed map-side joins on RDD.
  • Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
  • On demand, secure EMR launcher with custom Spark submit steps using S3 Event, SNS, KMS and Lambda function.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
  • Determining the viability of a business problem for a Big Data solution with Pyspark.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Monitored multiple Hadoop clusters environments using Ganglia and Monitored workload, job performance and capacity planning using MapR.
  • Great working experience with Splunk for real time log data monitoring.
  • Build cluster on AWS environment using EMR using S3, EC2, Redshift.
  • Worked with databricks for connecting the different sources and transforming data to store in cloud platform.
  • Experienced in building optimized data integration platform to provide efficient performance under developing data volumes.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments.
  • Great hands-on experience with Pyspark for using Spark liberties by using python scripting for data analysis.
  • Worked with (BI)Tableau teams as requirement of datasets and good working experience with Data visualization.

Environment: MapReduce, AWS, S3, EC2, EMR, RedShift, Glue, Java, HDFS, Hive, Pig, Tez, Oozie, HBase, Spark, Scala, Spark SQL, Kafka, Python, Putty, Pyspark, Cassandra, Shell Scripting, ETL, YARN, Splunk, Sqoop, LINUX, Cloudera, Ganglia, SQL Server.

Confidential

Software Engineer

Responsibilities:

  • Performed Requirement Gathering & Analysis by actively soliciting, analyzing, and negotiating customer requirements and prepared the requirements specification document for the application.
  • Preparation of the Detailed Design document for the project by developing business process flows, requirements definition, use cases, and object model.
  • Used MVC architecture in the project development.
  • Worked on core java for multithreading, arrays and GUI (AWT).
  • Experience in markup languages like HTML, DHTML, XML and Cascading style sheets (CSS).
  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
  • Involved in Servlets, JavaBean programming on the server side for the communication between clients and server.
  • Used CSS style sheets for presenting data from XML documents and data from databases to render on HTML web pages. Developed the client classes for the Web Service implementing SOAP.
  • Involved in development of a generic Data access object (DAO) layer module for user accounts and sales reporting using JDBC to interface with database systems running on Oracle.
  • Designed and implemented a GUI framework for Swing. Developers using the framework define actions; popup menus in XML the framework builds the graphical components.
  • Developed web application using JSP Framework.
  • Configured Spring and EJB to manage java beans and set their dependencies in a context file.
  • Experience on various JavaScript frame works i.e., jQuery, AJAX, JSON and AngularJS.
  • Published and consumed Web Services using SOAP, WSDL and deployed it on WebLogic server Web Server.
  • Strong Knowledge of SQL and PL/SQL and good in writing stored procedures and triggers in Oracle 8i/9i/10g.
  • Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.

Environment: Java, J2EE, EJB, JNDI, JMS, JDBC, Servlets, JSP, XML, SAX, Design Patterns, MVC, Struts, CSS, HTML, DHTML, JavaScript 1.2, UML, JUnit, SOAP, WSDL, web services OAS, Javadoc, VSS, Solaris 8, C++, My SQL3.2.

We'd love your feedback!