We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY:

  • About 8+ years of professional experience in IT which includes work experience in Big Data, Hadoop ecosystem related technologies.
  • Leveraged strong skills in developing applications involving Big Data technologies like Hadoop, Map
  • Reduce, Yarn, Flume, Hive, Pig, Sqoop, HBase, Kafka, Cloudera, Map R, Avro, Spark and Scala.
  • Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG and
  • MapReduce.
  • Experience in building large scale highly available Web Applications. Working knowledge of web services and other integration patterns.
  • Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real time data processing and performance improvements based on data access patterns.
  • Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
  • Developed Simple to complex Map/reduce and Streaming jobs using Java and Scala language.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
  • Hands On experience on Spark Core, Spark SQL, Spark Streaming and creating teh Data Frames handle in
  • SPARK wif Scala.
  • Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task
  • Tracker, Name Node, Data Node, and Map Reduce concepts.
  • Strong Experience in working wif Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Extensive hands - on experience in using distributed computing architectures such as AWS products (e.g., EC2 Redshift, EMR, and Elastic search), Hadoop, Python, Spark and TEMPeffective use of Azure SQL Database MapReduce, Hive, SQL and PySpark to solve big data type problems.
  • Working knowledge in AWS environment and AWS spark wif Strong experience in Cloud computing pl

PROFESSIONAL EXPERIENCE:

Confidential

Senior Big Data Engineer

Responsibilities:

  • Developed Spark scripts by using Scala, Java as per teh requirement. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for DataAggregation, queries and writing data back into OLTP system through Sqoop. Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive. Experience in Converting existing AWS Infrastructure to Server less architecture (AWS Lambda, Kinesis), deploying via Terraform and AWS Cloud Formation templates. Good experience wif Talend open studio for designing ETL Jobs for Processing of data. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. Collaborated wif teh infrastructure, network, database, application and BI teams to ensure data quality and availability. Created functions and assigned roles in
  • AWS Lambda to run python scripts, and AWS Lambda using javato perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI. Optimizing of existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frames and PairRDD's. Worked extensively wif Sqoop for importing metadata from Oracle. Implemented AWS provides a variety of computing and networking services to meet teh needs of applications Writing HiveQL as per teh requirements and Processing data in Spark engine and store in Hive tables.
  • Importing existing datasets from Oracle to Hadoop system using SQOOP. Created Sqoop jobs wif incremental load to populate Hive External tables. Writing teh Spark Core Programs for processing and cleansing data theirafter load that data into Hive orHBase for further processing.
  • Designing and Developing Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing. Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised byapplications. Developed Spark scripts to import large files from Amazon S3 buckets. Used Spark Streaming to receive real time data from teh Kafka and store teh stream data to HDFS using
  • Pythonand NoSQL databases such as HBase and Cassandra Experience in change implementation, monitoring and troubleshooting of AWS Snowflake databases and clusterrelated issues. Created and managed cloud VMs wif AWS EC2 Command line clients and AWS management console.
  • Migrated on premise database structure to Confidential Redshift data warehouse. Worked on AWS Data Pipelineto configure data loads from S3 into Redshift Understanding of AWS Product and Service suite primarily EC2, S3, VPC, Lambda, Redshift, Spectrum,Atana, EMR(Hadoop) and other monitoring service of products and their applicable use cases, best practicesand implementation, and support

Confidential

Senior Big Data Engineer

Responsibilities:

  • Used Spark, Hive for implementing teh transformations need to join teh daily ingested data to historic data. Used Spark - Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time.
  • Used Spark API over EMR Cluster Hadoop YARN to perform analytics on data in Hive. Works on loading data into Snowflake DB in teh cloud from various sources. Integrated and automated data workloads to Snowflake Warehouse. Involved in Functional Testing, Integration testing,
  • Regression Testing, Smoke testing and performanceTesting. Tested Hadoop Map Reduce developed in python, pig, Hive Extensively worked on Python and build teh custom ingest framework. Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services(Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate wif other Azure Services. Knowledge of USQL Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored(ADLS) using Azure
  • Data Factory (ADF V1/V2). Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark,TEMPEffective & efficient Joins, Transformations and other during ingestion process itself. Experienced in writing live Real-time Processing using Spark Streaming wif Kafka. Design and implement Azure Site Recovery from On-Premises to teh cloud Experience in loading XML &
  • JSON files into NoSQL databases such as Marklogic and MongoDB Databaseusing Apache Nifi 1.8/1.3. Involved in Business meetings to understand and analyze teh Apache Nifi process group requirements. Developed reusable objects like PL/SQL program units and libraries, database procedures and functions,database triggers to be used by teh team and satisfying teh business rules. Used Pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing variousmachine learning algorithms. Expertise in R, MATLAB, python and respective libraries. Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into targetsystem from multiple sources Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data fromdifferent sources
  • like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct levelof Parallelism and memory tuning. Implemented Python script to call teh Cassandra Rest API, performed transformations and loaded teh datainto Hive. Created Cassandra tables to store various data

Confidential

Big Data Engineer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools likeHive, Pig, Zookeeper and Sqoop.
  • Migrating data from FS to Snowflake wifin teh organization Imported Legacy data from SQL Server and Teradata into Amazon S3. Developed Spark code using Scala and Spark - SQL/Streaming for faster testing and processing of data. Created consumption views on top of metrics to reduce teh running time for complex queries. Exported Data into Snowflake by creating Staging Tables to load Data of different files from
  • Amazon S3. Installed and Configured Sqoop to import and export teh data into Hive from Relational databases. Working experience wif data streaming process wif Kafka, Apache Spark, Hive. Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral datainto HDFS for analysis. Developed Spark/Scala, Python for regular expression (regex) project in teh Hadoop/Hive environment wif Linux/Windows for big data resources. Data sources are extracted, transformed and loaded to generate CSV data files wif
  • Python programming and SQL queries. Integrated HDP clusters wif Active Directory and enabled Kerberos for Autantication. Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage,cloud SQL, stack driver monitoring and cloud deployment manager. Setup Alerting and monitoring using Stack driver in GCP. Design and implement large scale distributed solutions in AWS.
  • Close monitoring and analysis of teh MapReduce job executions on cluster at task level and optimized Hadoopclusters components to achieve high performance. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction,transformation, and aggregation from multiple file formats for analyzing & transforming teh data to uncoverinsights into teh customer usage patterns. Monitoring resources and Applications using AWS Cloud Watch, including creating alarms to monitor metricssuch as EBS, EC2, ELB, RDS, S3, SNS and configured notifications for teh alarms generated based on eventsdefined. Automated teh cloud deployments using chef, python and AWS
  • Cloud Formation Templates. Worked on analyzing Hadoop Cluster and different big data analytic tools including Pig, Hive. Monitoring teh Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase. Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked wif Linux server admin team in administering teh server hardware and operating system.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environmentwif both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for dataaccess and analysis. Worked on NoSQL databas

Confidential

Hadoop-Spark Developer

Responsibilities:

  • Involved in complete big data flow of teh application starting from data ingestion from upstream to HDFS,processing and analyzing teh data in
  • HDFS. Troubleshooting teh Azure Development, configuration and Performance issues. Interacted wif multiple teams who are responsible for
  • Azure Platform to fix teh Azure Platform Bugs. Deployed teh initial Azure components like Azure Virtual Networks, Azure Application Gateway, AzureStorage and Affinity groups. Written Kafka REST API to collect events from front end. Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive. Created Metric tables, End user views in Snowflake to feed data for Tableau refresh. Developed and implemented Apache NIFI across various environments, written QA scripts in Python fortracking files. Developed shell scripts for running Hive scripts in Hive and Impala. Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tanloaded data into Parquet hive tables from Avro hive tables. Import, clean, filter and analyze data using tools such as SQL, HIVE and PIG. Involved in running all teh hive scripts through hive. Hive on Spark and some through Spark SQL. Involved in importing teh real time data to Hadoop using
  • Kafka and implemented teh Oozie job for dailyimports. Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexesfor improved performance and efficiency. Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. Thesemodels are tan implemented in SAS where they are interfaced wif MSSQL databases and scheduled to update on a timely basis.
  • Experience in design and developing Application leveraging MongoDB. Implemented a prototype for teh complete requirements using Splunk, python and Machine learning concepts Worked on migrating MapReduce programs into Spark transformations using Scala Tested Apache TEZ, an extensible framework for building high performance batch and interactive dataprocessing applications, on Pig and Hive jobs.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using teh Sparkframework. Collected teh JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updatesin Hive tables. Used Jira for bug tracking and Bitbucket to check - in and checkout code changes.

Environment: Hadoop, Scala, Azure, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Nifi, Oozie,Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX ShellScripting.

Confidential

Java/ Hadoop Developer

Responsibilities:

  • Involved in review of functional and non - functional requirements. Installed and configured Hadoop MapReduce, HDFS, Developed multiple
  • MapReduce Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC. Involved in frequent meeting wif clients to gather business requirement & converting them to technicalspecification for development team. Adopted agile methodology wif pair programming technique and addressed issues during system testing. Involved in creating Hive tables, loading teh data and writing hive queries that will run internally in aMapReduce way. Developed a custom File System plugin for Hadoop so it can access files on Data Platform.
  • Teh custom File System plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to workunmodified and access files directly. Developed application in Eclipse IDE. Experience in developing spring Boot applications for transformations. Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery and AJAX. Used struts framework to build MVC architecture and separate presentation from business logic. Involved in rewriting middle-tier on WebLogic application server. Actively involved in Code-Reviews & Coding Standards, Unit testing &
  • Integration Testing. Importing and exporting data into HDFS from Oracle Database and vice versa using Sqoop Installed and configured Pig and also written Pig Latin scripts. Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis. Developing Scripts and Batch Job to schedule various Hadoop Program. Written
  • Hive queries for data analysis to meet teh business requirements. Creating Hive tables and working on them using Hive QL. Experienced in defining job flows. Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS. Build and deployed war file in WebSphere application server.
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system Setup and benchmarked Hadoop/HBase clusters for internal use

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Linux,XML, Eclipse, Oracle, PL/SQL, MongoDB, Toad.

We'd love your feedback!