We provide IT Staff Augmentation Services!

Sr Bigdata Engineer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

:SUMMARY

  • Hadoop Developer with more than 13+ years of industry experience in delivering business intelligence and Data Science solutions for transformational and strategic growth initiatives using Big Data and AWS technologies.
  • Ability to partner with business leaders by providing thought leadership, vision and roadmap for BI and Big Data initiatives.
  • Successful Implementations of Bigdata initiatives in Banking, Manufacturing and Mining industry on time and within budget using Agile methodology.
  • Experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server, distributed applications which includes 8 years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, Impala, YARN, Sqoop, Flume, Oozie, Spark. Pyspark, AWS Stack (S3,EC2,EMR, Lambda, Step Functions, Aws GLUE, AWS Dynamo DB, AWS Athena, AWS RDS, Red Shift, Postgres, AWS Data Pipeline Python, Git Lab,GCP,Airflow
  • Experience in working on Spark with Python Data frames.
  • Experience on On Premise Clusters as well as On Cloud.
  • Experience working with REST APIs On AWS.
  • A solid experience and understanding of architecting, designing and operationalization of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.
  • Experience in multiple Hadoop distributions like Cloudera and MapR.
  • Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
  • Experience in performing data validation using HIVE dynamic partitioning and bucketing.
  • Expertise in Data ingestion using Sqoop and Flume
  • Excellent understanding and experience of NoSQL database HBase.
  • Ingestion framework developed in python.
  • Hard core data engineering experience in Python, SQL and Relational database, Datawarehouse/Data-Lake
  • Expert in writing, analyzing and tuning complex SQL queries
  • Experienced in extracting data from variety of sources, integrate them and prepare data model for optimal data delivery
  • Strong functional and object-oriented experience in python
  • Experience in creating and automate ETL pipelines and robust data workflow.
  • Experience in developing and using Restful APIs and able to integrate your application with other components using Python.
  • Expert in packaging python code for release and deployments using DevOps tools (Git, Jenkins)
  • Excellent written and verbal communication and interpersonal skills, able to effectively collaborate with technical and business partners
  • Experience on working with various file formats such as Avro, ORC and Parquet.
  • Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyse data, performed different ETL operations using Pig for joining and transformations on data to join, clean, aggregate and analyse data.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig jobsExperience in Talend like ETL tools to create structures for the data which is coming from mainframe.
  • Experience in getting data from various sources into HDFS and building reports using Tableau and QlikView
  • Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
  • Expertise in database and development using SQL and PL/SQL, MySQL, Teradata.
  • Performed design and development activities using Talend ETL Big Data and Data Integration tools for various projects creating reusable ETL components as well as writing Java code using IDE, standard and big data ETL jobs utilizing MapReduce and Spark processing framework(s) to process huge volume of data on Hadoop MapR distribution.
  • Involved in Code Optimization and performance tuning of ETL jobs and HQL queries.
  • Experience in Automation of processes through Unix shell/Perl scripting and scheduling tools.
  • Experience in Analysing, storing and managing the data using Big Data ecosystem, Hive, Spark, NoSQL, Exadata and Teradata environment.
  • Involved unit test cases and testing based on requirements for the project.
  • Involved in Support business and functional acceptance testing in DEV and QA.
  • Involved in Debugging Talend code and ETL procedures, compiling errors, review code and identify the areas of concerns.
  • Involved in Documentation of ETL orchestration/processes including technical designs, workflows, functions and dependencies
  • Played different roles as a Developer, a Team Lead, and a Technical Lead.

TECHNICAL SKILLS:

Methodologies: AGILE, WaterfallCloud Solution: AWS - S3, EC2, EMR, Lambda, Step Functions, CloudWatch, RDS, RedShift, Data Pipeline, DynamoDB, My Sql, Postgres and Shell Scripting

Hadoop Tools: HDFS, Map Reduce, YARN, HIVE, PIG, SQOOP, Impala, HBase, Spark, Python, Scala

Functional Tools: Tableau

Data Integration: Talend With Big Data

DW Appliance: Teradata

Databases: MS SQL Server 2016, Oracle 12c, Sybase 11

Operating System: Windows, Linux, Unix

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Sr BigData Engineer

Responsibilities:

  • Involved in ingesting data received from various data sources like sftp, Sql Server and Oracle.
  • Developed scripts for ingestion framework and scheduled batch and real time jobs in production through poller.
  • Created complex pig scripts for transforming the data for applying business rules
  • Handled projects in big data prudential platforms with leading 20 team at offshore.
  • Actively involved in all the big data prudential initiatives with clients and presented project footprint.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Followed parametrized approach for the schema details, file locations and delimiter details to make the coding efficient and reusable.
  • Involved in writing shell scripts in scheduling and automation of tasks to handle regular daily tasks.
  • Involved in writing spark programs in pyspark and scala for data ingestion and transformations
  • Test build and deploy applications.
  • Automated the process to copy files in Hadoop system for testing purpose at regular intervals.
  • Involved in creating Hive, HBase tables and loading with data and transforming using Scala.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Shared responsibility for administration of Apache Hadoop, Hive.
  • Involved in creation of different varieties of POCs to build tools with latest technologies
  • Involved in real time data ingestion using spark streaming
  • Involved in implementation of Cloud Solutions on AWS using different services for the GBTS customers

Technical Environment: Cloudera, Hive, Sqoop, Linux, AutoSys, Oracle, Sql Server, HiveQL, Talend, MySQL, Python, Spark, Kafka, Scala, AWS Ec2, S3, EMR, Lambda, Step Functions, Cloud Watch, Cloud Trail, Dynamo DB, Postgres, Athena and Glue

Confidential, Pleasanton, CA

Sr BigData Engineer

Responsibilities:

  • Implemented 1,200 Dispatch metrics for six mine sites via EMR, Spark and Step functions
  • Automated AWS data ingestion of Oracle and SQL server data sources using Sqoop and Step functions.
  • Developed AWS Lambda and Step functions to ingest data from JIGSAW application for different sites.
  • Developed Python and Spark programs to transform data using Spark SQL. Developed SQL scripts to transform and integrate data received from different feeds (ORACLE, MS-SQL Server, and flat files) and loaded intoAWS S3 buckets. Developed AWS CloudWatch events serverless solution to schedule, execute and monitor jobs.
  • Worked on performance tuning the data ingestion jobs by tuning SQOOP parameters, Oracle SQL scripts and T-SQL scripts to meet operational production requirements.
  • Involved in real time data ingestion using spark streaming

Environment: Cloudera, AWS, Shell Script, Python, Spark, Hive, Sqoop, MR, Linux, Git Lab, Dynamo DB, RDS

Confidential

Sr BigData Engineer

Responsibilities:

  • Migrating over 50 TB of Data from SAP to Hadoop.
  • Build Summarized data cuts for Data Scientists to build model in Spark Beyond
  • Involved in ingesting data received from various data sources like sftp, mainframe Sybase, SAP.
  • Developed scripts for data ingestion and scheduled batch jobs in production through Redwood.
  • Handled log file ingestions to big data from Splunk.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Followed parameterized approach for the schema details, file locations and delimiter details to make the coding efficient and reusable.
  • Involved in writing shell scripts in scheduling and automation of tasks to handle regular daily tasks.
  • Test build and deploy applications on Linux.
  • Automated the process to copy files in Hadoop system for testing purpose at regular intervals.
  • Involved in creating Hive tables and loading with data. in writing Oozie workflows. in production deployment and issues resolution in warranty. in creating RFC and Incidents using Service Now.

Environment: SAP, Redwood, Cloudera, Hive, Spark, Shell Script, Python, Oracle, Linux

Confidential, Phoenix, AZ

Sr BigData Engineer

Responsibilities:

  • Involved in ingesting data received from various data sources like sftp, mainframe and Teradata.
  • Developed scripts for cornerstone inbuilt Confidential tool for ingestion and scheduled batch jobs in production through poller.
  • Created complex pig scripts for transforming the data for applying business rules in both cs2.0 and cs3.0.
  • Handled cr20220 Confidential project in big data Confidential platforms with leading 20 team at offshore.
  • Actively involved in all the big data Confidential initiatives with clients and presented project footprint.
  • Received several appreciation mails from the clients for the successful implementations.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Followed parametrized approach for the schema details, file locations and delimiter details to make the coding efficient and reusable.
  • Involved in writing shell scripts in scheduling and automation of tasks to handle regular daily tasks.
  • Test build and deploy applications on Linux.
  • Automated the process to copy files in Hadoop system for testing purpose at regular intervals.
  • Involved in creating Hive tables and loading with data.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Shared responsibility for administration of Apache Hadoop, Hive.
  • Created reports using Tableau using HiveQL.

Technical Environment: MapR, Java, Hive, Linux, Oozie, Mainframe, HiveQL, Eclipse, Maven. Teradata, Talend, MySQL, Python, Cornerstone, Main Frame

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analysing, writing Hadoop MapReduce jobs using API, Pig and Hive.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components.
  • Responsible to manage data coming from different sources.
  • Supported MapReduce Programs those are running on the cluster.
  • Wrote MapReduce job using Java API for data Analysis and dim fact generations.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote Map Reduce job using Pig Latin.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Developed Java Map Reduce programs on mainframe data to transform into structured way.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
  • Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Implemented HBase API to store the data into HBase table from hive tables.
  • Writing Hive queries for joining multiple tables based on business requirement.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
  • Used Storm to analyse large amounts of non-unique data points with low latency and high throughput.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Technical Environment: CDH4, Java, MapReduce, HDFS, Hive, Pig, Linux, MySQL, MySQL Workbench, Maven, Java 6, Eclipse, PL/SQL

Confidential

Sr Software Engineer (On Hadoop)

Responsibilities:

  • Worked on reading multiple data formats on HDFS using Map Reduce
  • Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
  • Involved in loading data from UNIX file system to HDFS.
  • Used shell scripting and ETL tool Talend to pump data onto HDFS.
  • Extracted the data from Databases into HDFS using Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive and Pig.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
  • Responsible for analysing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON formats.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Optimized Map Reduce code, pig scripts and performance tuning and analysis.
  • Experience in hive partitioning, bucketing and performed joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Experience in debugging the code and Analysis the data for various reporting and analysis purpose.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.

Environment: CDH4, Hadoop, HDFS, Map Reduce, Sqoop, Teradata, Linux, Shell scripting, Java, Hive, PIG, Sqoop and SQL.

Confidential

Software Engineer ( On Java)

Responsibilities:

  • Developed front-end screens using JSP, HTML and CSS.
  • Developed server-side code using Struts and Servlets.
  • Developed core java classes for exceptions, utility classes, business delegate, and test cases.
  • Developed SQL queries using MySQL and established connectivity.
  • Worked with Eclipse using Maven plugin for Eclipse IDE.
  • Designed the user interface of the application using HTML5, CSS3, Java Server Faces 2.0 (JSF 2.0), JSP and JavaScript.
  • Tested the application functionality with JUnit Test Cases.
  • Developed all the User Interfaces using JSP and Struts framework.
  • Writing Client-Side validations using JavaScript.
  • Extensively used jQuery for developing interactive web pages.
  • Developed the DAO layer using hibernate and for real time performance used the caching system for hibernate.
  • Experience in developing web services for production systems using SOAP and WSDL.
  • Developed the user interface presentation screens using HTML, XML, and CSS.
  • Experience in working with spring using AOP, IOC and JDBC template.
  • Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
  • Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
  • The application was developed in Eclipse IDE and was deployed on Tomcat server.
  • Involved in scrum methodology.
  • Supported for bug fixes and functionality change.

Environment: Java, Struts 1.1, Servlets, JSP, HTML, CSS, JavaScript, Eclipse 3.2, Tomcat, Maven 2.x, MySQL, Spring, Hibernate, REST, Windows and Linux, JUnit.

We'd love your feedback!