Sr. Big Data Engineer Resume
SUMMARY:
- Overall 7 years of progressive IT experience in Analysis, Enterprise Application Development, Database Administration and Big data technologies. Complete life cycle (SDLC) experience of a product involved in System Analysis, Architecture, Technical design, development, testing, deployment & support medium to large - scale business applications using Agile Scrum and iterative development methodologies.
- Experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
- Good understanding of hadoop daemons like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker and Yarn Architecture.
- Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
- Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Pig as ETL tool to do transformations, joins, filter and some pre-aggregation.
- Developed User Defined Functions ( UDFs ) for Pig and Hive using Java based languages.
- Queried both Managed and External tables created by Hive using Impala.
- Experience in loading logs from multiple sources directly into HDFS using Flume .
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map - Reduce, Pig, Hive, and Sqoop ) as well as system specific jobs (such as Java programs and shell scripts).
- Experience in scheduling Jobs thru Oozie and knowledge on Autosys, TAC and Zena .
- Experience in processing of real-time data using Spark and Scala.
- Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
- Hands-on experience with message brokers such as Apache Kafka and RabbitMQ .
- Experience working with MapR volumes and snapshots for data redundancy.
- Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
- Hands on Evaluation of ETL ( Talend ) and OLAP tools and recommend the most suitable solutions based on business requirements.
- Experience in generating reports using Tableau by connecting to Hive .
- Experience in using Kerberose for authenticating end users using hadoop in a secure mode.
- Experience in working with file formats like Parquet, Avro, RC, ORC, Sequence Files and JSON etc.
- Excellent knowledge on UNIX and Shell scripting.
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC etc.
- Extensive experience in using Relational databases like Oracle, SQL Server, DB2, Teradata and MySQL.
- Experience in working with different Hadoop distributions like CDH, MapR and HortonWorks.
- Expertise in using Tomcat server and also application servers like JBoss and Web Logic.
- Good knowledge in Finance and Health Care Domains.
TECHNICAL SKILLS:
Hadoop / Big Data Stack: HDFS, YARN, MapReduce, Pig, Hive, Spark, SparkSQL, Scala, Kafka, ZooKeeper, HBase, Spark, Sqoop, Flume, Shell script, Oozie.
Hadoop Distributions: MapR, Horton Works.
Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.
No SQL Databases: HBase, Cassandra.
Query Languages: HiveQL, SQL, Pig.
Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.
Frameworks: MVC, Struts, Spring, And Hibernate.
Build& Integration Tools: Maven, Ant, Jenkins.
Operating Systems: Windows, Linux, Unix and CentOS.
PROFESSIONAL EXPERIENCE:
Confidential
Sr. Big data Engineer
Responsibilities:
- Developed Sqoop Framework to ingest Historical data and incremental data from Oracle, DB2 and SQL Server etc.
- Worked on flume, to read the messages from JMS Queue to load in HDFS.
- Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
- Transformed raw data by developing Pig scripts and loaded the data into HBase tables.
- Developed custom UDF’s to generate unique key for the use in pig transformations.
- Designed HBase schema to avoid hot spotting and exposed the data from HBase tables to REST API on UI.
- Identified control characters in the data and developed scripts to remove them.
- Converted existing Pig Scripts to Spark, as part of improving performance.
- Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
- Developed spark code using Scala for faster data processing using RDD's and Dataframe API.
- Executed Spark SQL queries against data in Hive in spark context and done performance optimization.
- Worked on Creating Kafka topics, partitions, writing custom practitioner classes.
- Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
- Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
- Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in hive and Map side joins in MapReduce.
- Created Branches in GitHub, pushed the code and deployed to production thru Jenkins for the production release.
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.
Environment: Hadoop, HDFS, MapR, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.
Confidential
Hadoop Developer
Responsibilities:
- Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
- Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to lake.
- Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
- Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
- Created Hive refiners for simple UNIONs and JOINS.
- Handled Hive Queries using Spark SQL that integrates Spark environment.
- Automated the triggering of Data Lake REST API calls using Unix Shell Scripting.
- Used Scala to test Data frame transformations and debugging issues with data.
- Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
- Added AppDynamics monitoring to JVM to gather statistics for REST application.
- Used Avro format for staging data and Parquet for final repository.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Worked on data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to Java web services and which has been accessed by the end users.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
- Worked in tuning Hive and Pig scripts to improve performance.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
- Performed unit testing and integration testing using Junit framework.
- Configured build scripts for multi module projects with Maven and Jenkins.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Horton Works, Spark, SparkSQL, Scala, Pig, Hive, Oozie, Sqoop, Sybase, Kafka, Linux, Java, Maven, Junit, Maven, Jenkins.
Confidential
Java Developer
Responsibilities:
- Involved in Requirement analysis and design, development of the application using Java Technologies.
- Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
- Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
- Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
- Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
- Implemented business logic by developing Session Beans.
- Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
- Used Hibernate as the ORM and PL/SQL for handling database processing.
- Used JDBC-API to communicate with the Database.
- Developed application using Waterfall model software methodology.
- Involved In technical documentation of project.
Environment: Java, HTML, CSS, JSP, Servlets, EJB, JQuery, JDBC, Hibernate, PL/SQL.
