Sr. Hadoop Big Data Engineer Resume CA - Hire IT People

SUMMARY:

6+ Years of Professional IT experience in Big Data, Hadoop, Java /J2EE and Cloud technologies in Financial, Retail and HealthCare domains
Transformed date related data into application compatible format by developing apache Pig UDFs.
Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
Strong ETL experience using Informatica Power Center 9.5.x/ 9.1/8.6.1/7.1/6.2/5.1 .
Extensively worked on Spark and its components like Sparksql, SparkR and Spark streaming.
Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
Good Expertise in Planning, Installing and Configuring Hadoop Cluster based on the business needs.
Developed analytical components using Scala, Spark and Spark SQL.
Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
Worked on all major distributions of Hadoop Cloudera (CDH4, CDH5), Hortonworks (HDP 2.2, 2.4) and Pivotal.
Experience in implementing Failover mechanisms for Namenode, Resource Manager and Hive.
Configured AWSEC2 instances, S3Buckets, Cloud services and architected the flow of data to and from AWS.
Strong knowledge in NOSQL column oriented databases like Cassandra, MongoDB and its integration with Hadoop cluster.
Hands on experience of UNIX and shell scripting to automate scripts
Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.
Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, snappy in Hadoop.
Experience writing Oozie workflows and Job Controllers for job automation.
Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
In-Depth knowledge of Scala and Experience building Spark applications using Scala.
Good experience working on Tableau and Spotfire.
Adequate knowledge of Scrum, Agile and Waterfall methodologies.
Experience in developing Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
Expertise in web Technologies like HTML, CSS, PHP, XML.
Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS:

Big Data / Hadoop: HDFS, MapReduce, HBase, HIVE, Sqoop,Real time/Stream Processing,Apache Storm, Apache Spark

Operating Systems: Windows 7/8/10, Linux(RHEL, Ubuntu), OS X

ETL/BI Tools: MSBI, Talend, Informatica Power Center 9.x/8.6.

Programming Language: C, Java, SQL, PL/SQL, Python, Ruby, Shell Scripting

Data Base: Oracle 9i/10/11g, MySQL, Postgres, Crate.IO, CockroachDB, MongoDb5.3.9

Web Technologies: HTML5, XML, JavaScript, Rails, AJAX

Web/App Servers: Apache Tomcat 6.0, Jetty

IDE Development Tools: Eclipse, NetBeans

Methodologies: Agile, Scrum and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, CA

Sr. Hadoop Big Data Engineer

Responsibilities:

Installing, configuring and testing Hadoop ecosystem components like MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Hue and HBase.
Imported data from various sources into HDFS and Hive using Sqoop.
Exporting data from HDFS into PostgreSQL using python based Hawq framework
Involved in writing custom MapReduce, Pig and Hive programs.
Developed java applications that parses the mainframe report and put into CSV Files and another application will compare the data from SQL server and mainframe report(.dat file) and generates a rip file
Experience in writing customized UDF's in java to extend Hive and Pig Latin functionality.
Created Partitions and Buckets in Hive for both Managed and External tables for optimizing performance.
Worked on several PoC's involving No SQL Databases like HBase, MongoDB and Cassandra.
Configured Tez as execution engine for Hive queries to improve the performance.
Developed a data pipeline using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
Hands on experience in Spark and Spark Streaming creating RDD's, Applying operations -Transformation and Actions on it.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
Created a new data model that embed NoSQl submodels within a relational data model by applying Hybrid data modelling concepts.
In-depth knowledge of Scala and experienced in building the Spark applications using Scala.
Configured Flume to stream data into HDFS and Hive using HDFS Sinks and Hive sinks.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Involved in scheduling Oozie workflow engine to run multiple Hive, Pig and Spark jobs.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Done performance tuning in the hive at all point of phases.
Developed pig UDF’s in java for cleaning the bad records/data
Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.
Experience in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.

Environment: Hadoop, JEE8, MongoDB 3.5.9, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, and Big Data.

Confidential

Sr. Hadoop Big Data Engineer

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase and HDFS.
Designing and implementing semi-structured data analytics platform leveraging Hadoop.
Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
Used Sqoop to load data from RDBMS into HDFS.
Worked on implementing several POCs to validate and fit the several Hadoop eco system tools on CDH and Hortonworks distributions
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
Proficient in data modelling with Hive partitioning, bucketing, and other optimization techniques in Hive
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Set up standards and processes for Hadoop based application design and implementation.
Wrote Shell scripts for several day-to-day processes and worked on its automation.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Implemented Fair Schedulers on the Job tracker to share the resources of the Cluster for the Map r educe jobs given by the users.
Worked on establishing connectivity between Tableau and Spotfire.

Environment: Hadoop, HDFS, Map Reduce, Mongo DB, Java/JEE 7, VMware 5.1, HIVE, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux, UNIX.

Confidential, King of Prussia, PA

J2EE/Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop .
Collection and Downloading of data generated by sensors from the Patients body activities to HDFS.
Performed necessary transformations and aggregation to build the common learner data model in NoSQL store (Hbase).
Used Pig , Hive and MapReduce for analyzing the Health insurance data and patient information.
Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
Used Pig UDF's in Python, Java code and used sampling of large data sets.
Moving all log files generated from various sources to HDFS for further processing through Flume .
Extensively used PIG to communicate with Hive and Hbase using Hcatalog and Handlers.
Involved in transforming data from legacy tables to HDFS , and Hbase tables using Sqoop .
Implemented test scripts to support test driven development and continuous integration.
Exported analyzed data to relational databases using Sqoop for visualization and generate reports for the BI team.
Good understanding of ETL tools and their application to Big Data environment.

Environment: Hadoop, Map Reduce, Spark, HDFS, Hive, Pig, Oozie, Core Java, Hbase, Flume, Cloud era, Oracle 10g, UNIX Shell Scripting .

Confidential, St Petersburg, FL

Java Developer

Responsibilities:

Designed the application in J2EE architecture and developed dynamic and browser compatible User Interfaces for on-line account management, order and payment processing.
Used Hibernate Object relational mapping ( ORM ) to achieve data persistence.
Developed Servlets and JSPs based on MVC pattern using Spring Framework .
Developed required helper classes following Core Java multi-threaded programming.
Developed the presentation layer using JSP , Tag libraries , HTML , CSS and client validations using JavaScript.
Developed hibernate DAO Classes using Spring JDBC Template and Methods in the DAO layer to persist the POJOS in the database.
Designed and developed Web services based on SOAP and WSDL for handling transaction history.
Involved in designing and developing the JSON , XML Objects with MySQL .
Developed web applications using Spring MVC, jQuery and implemented Spring Dependency Injection mechanism.
Integrated user interface, server layer and persistence layer using Spring IOC, AOP and Spring MVC integration with OBPM and Hibernate .
Developed data access classes using JDBC and created SQL queries and used PL/SQL procedures with Oracle Database.
Used LOG4J & JUnit for debugging, testing and maintaining the system state and tested the website with older and latest versions/releases on multiple browsers.
Implemented test cases for Unit testing of modules using JUnit and used ANT for building the project.
Provided production support for two of the applications involving swing and struts framework.

Environment: JDK 1.6, JSP, HTML, JavaScript, JSON, XML, jQuery, Servlets, Spring MVC, Hibernate, Web Services, SOAP, NetBeans.

We provide IT Staff Augmentation Services!

Sr. Hadoop Big Data Engineer Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship