Sr. Big data Engineer Resume

SUMMARY

Overall 7 years of progressive IT experience in Analysis, Enterprise Application Development, Database Administration and Big data technologies.
Complete life cycle (SDLC) experience of a product involved in System Analysis, Architecture, Technical design, development, testing, deployment & support medium to large - scale business applications using Agile Scrum and iterative development methodologies.
Experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
Good understanding of hadoop daemons like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker and Yarn Architecture.
Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
Experienced wif SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Pig as ETL tool to do transformations, joins, filter and some pre-aggregation.
Developed User Defined Functions (UDFs) for Pig and Hive using Java based languages.
Queried both Managed and External tables created by Hive using Impala.
Experience in loading logs from multiple sources directly into HDFS using Flume.
Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Experience in scheduling Jobs thru Oozie and noledge on Autosys, TAC and Zena.
Experience in processing of real-time data using Spark and Scala.
Good noledge of integrating Spark Streaming wif Kafka for real time processing of streaming data.
Hands-on experience wif message brokers such as Apache Kafka and RabbitMQ.
Experience working wif MapR volumes and snapshots for data redundancy.
Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
Hands on Evaluation of ETL (Talend) and OLAP tools and recommend teh most suitable solutions based on business requirements.
Experience in generating reports using Tableau by connecting to Hive.
Experience in using Kerberose for autanticating end users using hadoop in a secure mode.
Experience in working wif file formats like Parquet, Avro, RC, ORC, Sequence Files and JSON etc.
Excellent noledge on UNIX and Shell scripting.
Expertise in design and development of Web Applications involving J2EE technologies wif Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC etc.
Extensive experience in using Relational databases like Oracle, SQL Server, DB2, Teradata and MySQL.
Experience in working wif different Hadoop distributions like CDH, MapR and HortonWorks.
Expertise in using Tomcat server and also application servers like JBoss and Web Logic.
Good noledge in Finance and Health Care Domains.

TECHNICAL SKILLS

Hadoop / Big Data Stack: HDFS, YARN, MapReduce, Pig, Hive, Spark, SparkSQL, Scala, Kafka, ZooKeeper, HBase, Spark, Sqoop, Flume, Shell script, Oozie.

Hadoop Distributions: MapR, Horton Works.

Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.

No SQL Databases: HBase, Cassandra.

Query Languages: HiveQL, SQL, Pig.

Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.

Frameworks: MVC, Struts, Spring, And Hibernate.

Build& Integration Tools: Maven, Ant, Jenkins.

Operating Systems: Windows, Linux, Unix and CentOS.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Big data Engineer

Responsibilities:

Developed Sqoop Framework toingestHistorical data and incremental data from Oracle, DB2 and SQL Server etc.
Worked on flume, to read teh messages from JMS Queue to load in HDFS.
Developed MapReduce programs using Java to parse teh raw data, populate staging tables and store teh refined data in partitioned tables in HDFS.
Transformed raw data by developing Pig scripts and loaded teh data into HBase tables.
Developed custom UDF’s to generate unique key for teh use in pig transformations.
Designed HBase schema to avoid hot spotting and exposed teh data from HBase tables to REST API on UI.
Identified control characters in teh data and developed scripts to remove them.
Converted existing Pig Scripts to Spark, as part of improving performance.
Helped market analysts by creating Hive queries to spot teh emerging trends by comparing fresh data wif HDFS reference tables and historical metrics.
Developed spark code using Scala for faster data processing using RDD's and Dataframe API.
ExecutedSparkSQL queries against data in Hive insparkcontext and done performance optimization.
Worked on Creating Kafkatopics, partitions, writing custom practitioner classes.
Defined teh job flows in Oozie to automate teh process of data loading into teh HDFS and Pig.
Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in hive and Map side joins in MapReduce.
Created Branches in GitHub, pushed teh code and deployed to production thru Jenkins for teh production release.
Involved in complete SDLC life cycle of big data project dat includes requirement analysis, design, coding, testing and production.

Environment: Hadoop, HDFS, MapR, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

Implemented EP Data Lake provides a platform to manage data in a central location so dat anyone in teh firm can rapidly query, analyze or refine teh data in a standard way.
Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating teh data processing to lake.
Responsible for creating Data store, Datasets and Virtual Warehouse in teh lake and tan creating Spark and Hive refiners to implement teh existing SQL Stored Procedures.
Developed MapReduce programs using Java to parse teh raw data, populate staging tables and store teh refined data in partitioned tables in HDFS.
Created Hive refiners for simple UNIONs and JOINS.
Handled Hive Queries using Spark SQL dat integrates Spark environment.
Automated teh triggering of Data Lake REST API calls using Unix Shell Scripting.
Used Scala to test Data frame transformations and debugging issues wif data.
Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate wif other IDE features in Eclipse.
Added AppDynamics monitoring to JVM to gather statistics for REST application.
Used Avro format for staging data and Parquet for final repository.
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Worked on data modeling service which is our own tool (me.e. PURE MODEL). me have used teh data from data lake virtual warehouse and me have exposed teh output of data model to Java web services and which TEMPhas been accessed by teh end users.
Implemented Daily Cron jobs dat automate parallel tasks of loading teh data into HDFS and pre-processing wif Pig using Oozie co-coordinator jobs.
Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
Worked intuning Hive and Pig scriptsto improve performance.
Involved in creating Oozie workflow and Coordinator jobs to kick off teh jobs on time and data availability.
Performed unit testing and integration testing using Junit framework.
Configured build scripts for multi module projects wif Maven and Jenkins.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, HDFS, Horton Works, Spark, SparkSQL, Scala, Pig, Hive, Oozie, Sqoop, Sybase, Kafka, Linux, Java, Maven, Junit, Maven, Jenkins.

Confidential

Java Developer

Responsibilities:

Involved in Requirement analysis and design, development of teh application using Java Technologies.
Developed teh login screen so dat teh application can be accessed only by authorized and autanticated administrators.
Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
Implemented business logic by developing Session Beans.
Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
Used Hibernate as teh ORM and PL/SQL for handling database processing.
Used JDBC-API to communicate wif teh Database.
Developed application using Waterfall model software methodology.
Involved In technical documentation of project.

Environment: Java, HTML, CSS, JSP, Servlets, EJB, JQuery, JDBC, Hibernate, PL/SQL.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship