We provide IT Staff Augmentation Services!

Sr. Hadoop Big Data Engineer Resume

4.00/5 (Submit Your Rating)

CA

SUMMARY:

  • 6+ Years of Professional IT experience in Big Data, Hadoop, Java /J2EE and Cloud technologies in Financial, Retail and HealthCare domains
  • Transformed date related data into application compatible format by developing apache Pig UDFs.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
  • Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
  • Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
  • Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
  • Strong ETL experience using Informatica Power Center 9.5.x/ 9.1/8.6.1/7.1/6.2/5.1 .
  • Extensively worked on Spark and its components like Sparksql, SparkR and Spark streaming.
  • Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
  • Good Expertise in Planning, Installing and Configuring Hadoop Cluster based on the business needs.
  • Developed analytical components using Scala, Spark and Spark SQL.
  • Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Worked on all major distributions of Hadoop Cloudera (CDH4, CDH5), Hortonworks (HDP 2.2, 2.4) and Pivotal.
  • Experience in implementing Failover mechanisms for Namenode, Resource Manager and Hive.
  • Configured AWSEC2 instances, S3Buckets, Cloud services and architected the flow of data to and from AWS.
  • Strong knowledge in NOSQL column oriented databases like Cassandra, MongoDB and its integration with Hadoop cluster.
  • Hands on experience of UNIX and shell scripting to automate scripts
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, snappy in Hadoop.
  • Experience writing Oozie workflows and Job Controllers for job automation.
  • Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience working on Tableau and Spotfire.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Experience in developing Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
  • Expertise in web Technologies like HTML, CSS, PHP, XML.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS:

Big Data / Hadoop: HDFS, MapReduce, HBase, HIVE, Sqoop,Real time/Stream Processing,Apache Storm, Apache Spark

Operating Systems: Windows 7/8/10, Linux(RHEL, Ubuntu), OS X

ETL/BI Tools: MSBI, Talend, Informatica Power Center 9.x/8.6.

Programming Language: C, Java, SQL, PL/SQL, Python, Ruby, Shell Scripting

Data Base: Oracle 9i/10/11g, MySQL, Postgres, Crate.IO, CockroachDB, MongoDb5.3.9

Web Technologies: HTML5, XML, JavaScript, Rails, AJAX

Web/App Servers: Apache Tomcat 6.0, Jetty

IDE Development Tools: Eclipse, NetBeans

Methodologies: Agile, Scrum and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, CA

Sr. Hadoop Big Data Engineer

Responsibilities:

  • Installing, configuring and testing Hadoop ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Hue and HBase.
  • Imported data from various sources into HDFS and Hive using Sqoop.
  • Exporting data from HDFS into PostgreSQL using python based Hawq framework
  • Involved in writing custom MapReduce, Pig and Hive programs.
  • Developed java applications that parses the mainframe report and put into CSV Files and another application will compare the data from SQL server and mainframe report(.dat file) and generates a rip file
  • Experience in writing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created Partitions and Buckets in Hive for both Managed and External tables for optimizing performance.
  • Worked on several PoC's involving No SQL Databases like HBase, MongoDB and Cassandra.
  • Configured Tez as execution engine for Hive queries to improve the performance.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
  • Hands on experience in Spark and Spark Streaming creating RDD's, Applying operations -Transformation and Actions on it.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Created a new data model that embed NoSQl submodels within a relational data model by applying Hybrid data modelling concepts.
  • In-depth knowledge of Scala and experienced in building the Spark applications using Scala.
  • Configured Flume to stream data into HDFS and Hive using HDFS Sinks and Hive sinks.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, Pig and Spark jobs.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Done performance tuning in the hive at all point of phases.
  • Developed pig UDF’s in java for cleaning the bad records/data
  • Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.
  • Experience in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.

Environment: Hadoop, JEE8, MongoDB 3.5.9, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, and Big Data.

Confidential

Sr. Hadoop Big Data Engineer

Responsibilities:

  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase and HDFS.
  • Designing and implementing semi-structured data analytics platform leveraging Hadoop.
  • Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
  • Used Sqoop to load data from RDBMS into HDFS.
  • Worked on implementing several POCs to validate and fit the several Hadoop eco system tools on CDH and Hortonworks distributions
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
  • Proficient in data modelling with Hive partitioning, bucketing, and other optimization techniques in Hive
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Wrote Shell scripts for several day-to-day processes and worked on its automation.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Implemented Fair Schedulers on the Job tracker to share the resources of the Cluster for the Map r educe jobs given by the users.
  • Worked on establishing connectivity between Tableau and Spotfire.

Environment: Hadoop, HDFS, Map Reduce, Mongo DB, Java/JEE 7, VMware 5.1, HIVE, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux, UNIX.

Confidential, King of Prussia, PA

J2EE/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop .
  • Collection and Downloading of data generated by sensors from the Patients body activities to HDFS.
  • Performed necessary transformations and aggregation to build the common learner data model in NoSQL store (Hbase).
  • Used Pig , Hive and MapReduce for analyzing the Health insurance data and patient information.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
  • Used Pig UDF's in Python, Java code and used sampling of large data sets.
  • Moving all log files generated from various sources to HDFS for further processing through Flume .
  • Extensively used PIG to communicate with Hive and Hbase using Hcatalog and Handlers.
  • Involved in transforming data from legacy tables to HDFS , and Hbase tables using Sqoop .
  • Implemented test scripts to support test driven development and continuous integration.
  • Exported analyzed data to relational databases using Sqoop for visualization and generate reports for the BI team.
  • Good understanding of ETL tools and their application to Big Data environment.

Environment: Hadoop, Map Reduce, Spark, HDFS, Hive, Pig, Oozie, Core Java, Hbase, Flume, Cloud era, Oracle 10g, UNIX Shell Scripting .

Confidential, St Petersburg, FL

Java Developer

Responsibilities:

  • Designed the application in J2EE architecture and developed dynamic and browser compatible User Interfaces for on-line account management, order and payment processing.
  • Used Hibernate Object relational mapping ( ORM ) to achieve data persistence.
  • Developed Servlets and JSPs based on MVC pattern using Spring Framework .
  • Developed required helper classes following Core Java multi-threaded programming.
  • Developed the presentation layer using JSP , Tag libraries , HTML , CSS and client validations using JavaScript.
  • Developed hibernate DAO Classes using Spring JDBC Template and Methods in the DAO layer to persist the POJOS in the database.
  • Designed and developed Web services based on SOAP and WSDL for handling transaction history.
  • Involved in designing and developing the JSON , XML Objects with MySQL .
  • Developed web applications using Spring MVC, jQuery and implemented Spring Dependency Injection mechanism.
  • Integrated user interface, server layer and persistence layer using Spring IOC, AOP and Spring MVC integration with OBPM and Hibernate .
  • Developed data access classes using JDBC and created SQL queries and used PL/SQL procedures with Oracle Database.
  • Used LOG4J & JUnit for debugging, testing and maintaining the system state and tested the website with older and latest versions/releases on multiple browsers.
  • Implemented test cases for Unit testing of modules using JUnit and used ANT for building the project.
  • Provided production support for two of the applications involving swing and struts framework.

Environment: JDK 1.6, JSP, HTML, JavaScript, JSON, XML, jQuery, Servlets, Spring MVC, Hibernate, Web Services, SOAP, NetBeans.

We'd love your feedback!