Senior Data Engineer Resume

SUMMARY:

8+ years of overall experience in IT Industry which includes experience in Big Data Technologies and Web Applications in multi - tiered environment using Hadoop, Spark, Hive, Pig, Sqoop, J2EE (Spring, JSP, Servlets), JDBC, HTML, CSS and Java Script (Angular JS).
Hands On experience on Spark Core, Spark SQL, Spark Streaming and creating the Data Frames handle in SPARK with Scala.
Working knowledge in AWS environment and AWS spark with Strong experience in Cloud computing platforms such as AWS services.
Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR.
Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real time data processing and performance improvements based on data access patterns.
EMR with Hive to handle less important bulk ETL jobs.
Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts.
Experience in building large scale highly available Web Applications. Working knowledge of web services and other integration patterns.
Developed Simple to complex Map/reduce and Streaming jobs using Java and Scala language.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
Experience in Amazon web services (AWS) cloud like S3, EC2 and EMR and in Microsoft Azure.
Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB,
Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES
Expertise in Azure infrastructure m

PROFESSIONAL EXPERIENCE:

Confidential

Senior Data Engineer

Responsibilities:

Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop. Optimizing of existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frames and Pair RDD's. Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins
Transformations and other during ingestion process itself. Developed Spark scripts by using Scala, Java as per the requirement. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Experience in Converting existing AWS Infrastructure to Server less architecture (AWS Lambda, Kinesis), deploying via Terraform and AWS Cloud Formation templates. Good experience with Talend open studio for designing ETL Jobs for Processing of data. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability. Migrated an existing on-premises application to AWS. Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
Developed Spark streaming pipeline in Java to parse JSON data and to store in Hive tables Worked extensively with Sqoop for importing metadata from Oracle. Involved in creating Hive tables and loading and analyzing data using hive queries. Developed Hive queries to process the data and generate the data cubes for visualizing. Implemented schema extraction for Parquet and Avro file Formats in Hive. Used Spark
Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and
Aggregation on the fly to build the common learner data model and persists the data in HDFS. Implemented AWS provides a variety of computing and networking services to meet the needs of applications Writing HiveQL as per the requirements and Processing data in Spark engine and store in Hive tables. Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts. Responsible for importing data from Posgres to HDFS, HIVE using SQOOP tool. Experienced in migrating
HiveQL into Impala to minimize query response time. Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements. Importing existing datasets from Oracle to Hadoop system using SQOOP. Created Sqoop jobs with incremental load to populate Hive External tables. Writing the Spark Core Programs for processing and cleansing data thereafter load that data into Hive or HBase for further proces

Confidential

Big Data Engineer

Responsibilities:

Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow. Used Spark, Hive for implementing the transformations need to join the daily ingested data to historic data. Used Spark - Streaming APIs to perform necessary transformations and actions on the fly for building the common Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL,
Data Frames and Pair RDD's. Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark. Developed reusable transformations to load data from flat files and other data sources to the Data Warehouse. Assisted operation support team for transactional data loads in developing SQL Loader & Unix scripts Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive. Extensively worked on Python and build the custom ingest framework. Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient
Joins, Transformations and other during ingestion process itself. Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI. learner data model which gets the data from Kafka in near real time. Developed Spark scripts by using Scala shell commands as per the requirement. Used
Spark API over EMR Cluster Hadoop YARN to perform analytics on data in Hive. Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop. Creating
Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Experienced in performance tuning of Spark Applications for setting right
Batch Interval time, correct level of Parallelism and memory tuning. Experienced in writing live Real-time Processing using Spark Streaming with Kafka. Improve fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies. Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources Created Cassandra tables to store various data formats of data coming from different sources. Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassan

Confidential

Hadoop & Spark Developer

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop. Migrating data from FS to Snowflake within the organization Imported Legacy data from SQL Server and Teradata into Amazon S3. Created consumption views on top of metrics to reduce the running time for complex queries. Exported Data into Snowflake by creating Staging Tables to load
Data of different files from Amazon S3. Installed and Configured Sqoop to import and export the data into Hive from Relational databases. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Evaluated existing infrastructure, systems, and technologies and provided gap analysis and documented requirements, evaluation, and recommendations of system, upgrades, technologiescreated proposed architecture and specifications along with recommendations. Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into
HDFS for analysis. Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication. Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager. Setup Alerting and monitoring using Stack driver in GCP. Design and implement large scale distributed solutions in AWS and GCP clouds. Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase. Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system. Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non - traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis. Administering large
Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment. Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Worked on analyzing Hadoop Cluster and different big data analytic tools including Pig, Hive. Working experience with data streaming process with Kafka, Apache Spark,
Hive. Worked with various HDFS file formats like Avro, Sequence File, Nifi, Json and various compression formats like Snappy, bzip2. Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka. Developed Spark code using Scala and Spa

JK Infotech

Data Engineer

Roles& Responsibilities:• Involved in running all the hive scripts through hive. Hive on Spark and some through Spark SQL.• Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.• Troubleshooting the Azure Development, configuration and Performance issues.• Interacted with multiple teams who are responsible for Azure Platform to fix the Azure Platform Bugs.• Providing 24/7 support for on-call on Azure configuration and Performance issues.• Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups.• Written Kafka REST API to collect events from front end.• Worked on to retrieve the data from FS to S3 using spark commands• Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS• Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.• Involved in complete Big data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.• Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers.• Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.• Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.• Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.• Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.• Implemented a prototype for the complete requirements using Splunk, python and Machine learning concepts• Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.• Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.• Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.• Worked on product positioning and messaging that differentiate Hortonworks in the open source space.• Experience in design and developing Application leveraging MongoDB.• Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.• Developed Spark scripts to import large files from Amazon S3 buckets.• Developed shell scripts for running Hive scripts in Hive and Impala.• Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.Environment: Scala, Azure, HDFS, AWS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting.

Prolifics

Java Developer

Roles & Responsibilities:• Developed Micro services using RESTful services to provide all the CRUD capabilities.• Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.• Used JBoss application server deployment of applications.• Developed communication among SOA services.• Involved in creation of both service and client code for JAX-WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.• Designed the user interface of the application using HTML5, CSS3, JavaScript, Angular JS and AJAX.• Implemented the application using Agile methodology. Involved in daily scrum and sprint planning meetings.• Actively involved in analysis, detail design, development, bug fixing and enhancement.• Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.• Worked with Session Factory, ORM mapping, Transactions and HQL in Hibernate framework.• Used Web services for sending and getting data from different applications using Restful.• Wrote client side and server-side validations using Java Scripts Validations.• Writing stored procedures, complex SQL queries for backend operations with the database.• Devised logging mechanism using Log4j.• GitHub has been used as a Version Controlling System.• Creating tracking sheet for tasks and timely report generation for tasks progress.Environment: Java, J2EE, Java Swing, HTML, Java Script, Angular JS, Node.JS, JDBC, JSP, Servlet, UML, Hibernate, XML, JBoss, SDLC methodologies, Log4j, GitHub, Restful, JAX-RS, JAX-WS, Eclipse IDE.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship