Sr. Big data Engineer Resume

SUMMARY:

Overall 7 years of progressive IT experience in Analysis, Enterprise Application Development, Database Administration and Big data technologies. Complete life cycle (SDLC) experience of a product involved in System Analysis, Architecture, Technical design, development, testing, deployment & support medium to large - scale business applications using Agile Scrum and iterative development methodologies.
Experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
Good understanding of hadoop daemons like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker and Yarn Architecture.
Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
Used Pig as ETL tool to do transformations, joins, filter and some pre-aggregation.
Developed User Defined Functions ( UDFs ) for Pig and Hive using Java based languages.
Queried both Managed and External tables created by Hive using Impala.
Experience in loading logs from multiple sources directly into HDFS using Flume .
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map - Reduce, Pig, Hive, and Sqoop ) as well as system specific jobs (such as Java programs and shell scripts).
Experience in scheduling Jobs thru Oozie and knowledge on Autosys, TAC and Zena .
Experience in processing of real-time data using Spark and Scala.
Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
Hands-on experience with message brokers such as Apache Kafka and RabbitMQ .
Experience working with MapR volumes and snapshots for data redundancy.
Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
Hands on Evaluation of ETL ( Talend ) and OLAP tools and recommend the most suitable solutions based on business requirements.
Experience in generating reports using Tableau by connecting to Hive .
Experience in using Kerberose for authenticating end users using hadoop in a secure mode.
Experience in working with file formats like Parquet, Avro, RC, ORC, Sequence Files and JSON etc.
Excellent knowledge on UNIX and Shell scripting.
Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC etc.
Extensive experience in using Relational databases like Oracle, SQL Server, DB2, Teradata and MySQL.
Experience in working with different Hadoop distributions like CDH, MapR and HortonWorks.
Expertise in using Tomcat server and also application servers like JBoss and Web Logic.
Good knowledge in Finance and Health Care Domains.

TECHNICAL SKILLS:

Hadoop / Big Data Stack: HDFS, YARN, MapReduce, Pig, Hive, Spark, SparkSQL, Scala, Kafka, ZooKeeper, HBase, Spark, Sqoop, Flume, Shell script, Oozie.

Hadoop Distributions: MapR, Horton Works.

Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.

No SQL Databases: HBase, Cassandra.

Query Languages: HiveQL, SQL, Pig.

Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.

Frameworks: MVC, Struts, Spring, And Hibernate.

Build& Integration Tools: Maven, Ant, Jenkins.

Operating Systems: Windows, Linux, Unix and CentOS.

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Big data Engineer

Responsibilities:

Developed Sqoop Framework to ingest Historical data and incremental data from Oracle, DB2 and SQL Server etc.
Worked on flume, to read the messages from JMS Queue to load in HDFS.
Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
Transformed raw data by developing Pig scripts and loaded the data into HBase tables.
Developed custom UDF’s to generate unique key for the use in pig transformations.
Designed HBase schema to avoid hot spotting and exposed the data from HBase tables to REST API on UI.
Identified control characters in the data and developed scripts to remove them.
Converted existing Pig Scripts to Spark, as part of improving performance.
Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
Developed spark code using Scala for faster data processing using RDD's and Dataframe API.
Executed Spark SQL queries against data in Hive in spark context and done performance optimization.
Worked on Creating Kafka topics, partitions, writing custom practitioner classes.
Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in hive and Map side joins in MapReduce.
Created Branches in GitHub, pushed the code and deployed to production thru Jenkins for the production release.
Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.

Environment: Hadoop, HDFS, MapR, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to lake.
Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
Created Hive refiners for simple UNIONs and JOINS.
Handled Hive Queries using Spark SQL that integrates Spark environment.
Automated the triggering of Data Lake REST API calls using Unix Shell Scripting.
Used Scala to test Data frame transformations and debugging issues with data.
Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
Added AppDynamics monitoring to JVM to gather statistics for REST application.
Used Avro format for staging data and Parquet for final repository.
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Worked on data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to Java web services and which has been accessed by the end users.
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
Worked in tuning Hive and Pig scripts to improve performance.
Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
Performed unit testing and integration testing using Junit framework.
Configured build scripts for multi module projects with Maven and Jenkins.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, HDFS, Horton Works, Spark, SparkSQL, Scala, Pig, Hive, Oozie, Sqoop, Sybase, Kafka, Linux, Java, Maven, Junit, Maven, Jenkins.

Confidential

Java Developer

Responsibilities:

Involved in Requirement analysis and design, development of the application using Java Technologies.
Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
Implemented business logic by developing Session Beans.
Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
Used Hibernate as the ORM and PL/SQL for handling database processing.
Used JDBC-API to communicate with the Database.
Developed application using Waterfall model software methodology.
Involved In technical documentation of project.

Environment: Java, HTML, CSS, JSP, Servlets, EJB, JQuery, JDBC, Hibernate, PL/SQL.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship