Hadoop/bigdata Consultant Resume
Bellevue, WA
SUMMARY:
- 8+ years of extensive IT experience which includes 4 yearsofexperience in big data, hadoop ecosystem related technologies.
- Experience with Hadoop Ecosystem: Hortonworks 2.0/2.2/2.3,CDH 4.3.2/5.2, HDFS, MapReduce, YARN, Sqoop, Flume, Oozie, Pig, Hive, Scala, HBase, MongoDB, Cassandra, Solr, Spark, Storm and Kafka.
- Experience in importing and exporting the data using Sqoop from Relational Database systems to HDFS and vice - versa.
- Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues.
- Experience in performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (Hive QL).
- Experience in Streaming the Data to HDFS using Flume.
- Expertise in writing ETL Jobs for analyzing data using Pig.
- Experience in NoSQL Column-Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
- Hands on experience in using Map reduce programming model for Batch processing of data stored in HDFS.
- Experience in managing and reviewing Hadoop Log files.
- Experience with Solr in implementing indexes for fast data retrieval.
- Experience with OozieWorkflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Experience in Database Design, ER modeling, SQL, PL/SQL, procedures, functions, triggers.
- Extensive working knowledge on Teradata BTEQ and M-Load Scripts.
- Good experience in developing a build script using Ant, Maven, Jenkins and Accurev.
- Good working knowledge on Github and Nexus code repositories.
- Good knowledge on Spark, Kafka and Storm.
- Experience working with Talend.
- Proficient in development methodologies such as Agile, Scrum and Waterfall.
- Extensive working knowledge on Shell Scripting.
- Good Knowledge on working with multiple file formats like Sequence files, Avro, RC and ORC File Formats.
- Worked with customers, end users to formulate and document business requirements.
- Worked closely with Data Architects for creating S2TM (Source to Target Mapping).
- Experience in understanding source data by performing source system analysis.
- Worked extensively on Business Requirements Analysis, Functional and Non-Functional requirements analysis, RiskAnalysis and UAT.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server,MySQL and Teradata.
- Good communication, inter-personal skills, team player and contributor who delivers on schedule under tight schedules.
TECHNICAL SKILL:
Hadoop Distribution: Apache, Cloudera CDH, Hortonworks HDP
Big Data Technologies: Apache Hadoop (MRv1,MRv2), Hive,Pig,Sqoop,HBase,Flume,Zookeeper,Oozie, Ambari, Hue, Impala, Spark, Kafka and Storm
Operating Systems: Windows, Linux & Unix
Languages: C, Java, PL/SQL, Unix Shell, Python
Frameworks: Struts, Spring and Hibernate
Web Technologies: HTML, JSP, CSS, JavaScript
IDEs: Eclipse, IBM Web Sphere
Webservers /App Servers: Apache Tomcat 6.0/7.0, IBM WebSphere 6.0/7.0, JBoss 4.3
Database: Oracle 8i/9i/10g, MySQL,HBase(NoSQL), MongoDB(NoSQL), Teradata
Test Management Tool: Quality Center, BugZilla, JIRA
PROFESSIONAL EXPERIENCE:
Confidential,Bellevue, WA
Hadoop/BigData Consultant
Responsibilities:- Understand the business needs and objectives of the system and interacted with the end client/users and gathered requirements for the integrated system.
- Worked as a Source System Analyst to understand various source systems by interacting with each source SME’s.
- Worked as a Data Modeler/Architect to help Senior architects in designing the Target Data Models and S2TMs.
- Involved in various activities of the project like information gathering, analyzing the information, documenting the functional and nonfunctional requirements.
- Ingested data from different sources into Hadoop Data Lake using Sqoop.
- Performed data profiling activities like data type validation, NULL checks for the Ingested data using Java/Python MapReduce Programs.
- Used Pig as an ETL tool to do the business transformations and then load the transformed data into the target tables.
- Implemented Incremental Logic for Data Ingestion using HBase by setting proclog entry for each Data Load.
- Involved in integrating Cassandra with Hadoop.
- Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Created HBase Tables for each source table for maintaining audit information.
- Created Hive external tables for each source table in Hadoop Data Lake.
- Created Hive Partitions and Buckets for tables on basis of load date.
- Developed Pig Scripts for processing the Ingested data as per business requirement.
- Implemented SCD Type-2 Logic in processing layer for versioning the records using Java MapReduce.
- Converting SCD Type-1 Tables to SCD Type-2 Tables using Pig Transformations and Java UDFs.
- Developed Pig UDF’s in Java in order to convert date formats as desired.
- Created Oozie workflows to automate the actions required for each job.
- Experience working with Talend.
- Using Control-M as a Job scheduler and monitoring tool.
- Developed Denodo views for source like data for data availability for Business users.
- Implemented Hadoop Security using Kerberos Active Directory.
- Experience in Using Cloudera Hue Interface for Job monitoring and Viewing table metadata.
- Using Hcatalog as a Table management solution for both Pig and Hive Interfaces.
- Developed Shell Scripts for executing and parameterizing each Job.
- Using Kafka as a middleware to send data from various sources to Hadoop Data lake.
- Implemented Spark as a processing engine to achieve RDD.
- Developed Spark code using Scala and Spark-SQL for faster processing of data.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Analyzed the SQL scripts and redesigned the solution to implement using Scala
- Exporting Processed data back to Teradata staging layer using Sqoop.
- Loading data from edge node to Teradata stage tables using Teradata Multi-Load (M-Load) Utility.
- Exported Teradata Tables from Stage Layer to Core Layer using BTEQ Scripts.
- Extensive experience on Unit testing by creating Test Cases.
- Supported Business users during UAT.
- Experience in Development Methodologies like Agile,Waterfall.
- Supported QA Team by fixing defects logged in Quality Center.
- Experience in code repositories like Github, Nexus.
- Developed Business Objects Universe by connecting to Teradata Core tables.
- Experience in promoting code to QA using Accurev.
- Experience in Building Jobs and promoting code to QA using Jenkins Cloudbase Tool.
- Supporting Hadoop App Support team by writing SIS documents and code drop requests.
Environment: ApacheHadoop, Java, Python,Eclipse, Hortonworks, Spark, HBase, Cassandra, Map Reduce, Pig, Scala, Hive, Kafka, Teradata, Sqoop, Hcatalog, Hue, Linux, Jenkins, Github, Nexus, Control-M
Confidential,Charlotte,NC
Sr. Hadoop Developer
Responsibilities:- Worked on different Big Data tools including Pig, Hive, HBase and SQOOP.
- Coordinated with business customers to gather business requirements. And also, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Developed multiple POCs using Scala and deployed on the Hadoop cluster, compared the performance of Sparkwith Hive and SQL/Teradata.
- Written Hivejobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Load and Transform large sets of structured and semi structured data.
- Used Github as a code repository and version control tool.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Involved in Unit testing and delivered Unit test plans and results documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Implemented type 2 tables in Hadoop/Hbase.
Environment: Apache Hadoop, HDFS, MapReduce, Hbase, Java, Scala, Linux, Sqoop, Hive, Pig, Python, NoSQL, Flume, Oozie, Github.
Confidential,Minneapolis,MN
Hadoop Developer
Responsibilities:- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
- Experienced in defining job flows to run multiple MapReduce and Pig jobs using Oozie
- Importing log files using Flume into HDFS and load into Hive tables to query data
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries
- Involved in writing APIs to Read HBase tables, cleanse data and write to another HBase table
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Experienced in writing programs using HBase Client API
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Experienced in design, development, tuning and maintenance of NoSQL database
- Developed unit test cases for Hadoop MapReduce jobs with MRUnit
- Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database
Environment: Apache Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Oozie, Java, Linux, MySQL Server, MS SQL, SQL, PL/SQL, NoSQL.
Confidential
Java Developer
Responsibilities:- Transformation of XML to HTML documents using XSLT style sheet.
- Developed frontend Modules using MVC architecture using JSF 2.0.
- Used XSLT to develop templates and process XML data into a more user-friendly format.
- Programming and Development of modules involving Struts, JPA, Spring, AJAX, Servlets, JSP, JSTL, JQuery and JS.
- Optimization of Hibernate mapping in order to boost performance of the system.
- High level design of SOA components to complete end-to-endB2B integration
- Manage and deploy application using JBOSS Application Server 6.1 with deployment manager in a clustered environment.
- Developed views using JSPs and struts tags. Using Tiles framework, improving UI flexibility and providing single point of maintenance.
- Developed the code for asynchronous update to web page using JavaScript and Ajax.
- Developed application using JavaScript for Web pages to add functionality, validate forms, communicate with the server.
- Used Spring IOC, Writing Java Bean classes, with get and set methods for each property to be configured by spring.
- SOAP and REST based webservices are implemented using Apache CXF framework.
- Modified the configuration of the Spring Application Framework IOC Container.
- Used Hibernate ORM framework as persistence engine, actively engaged in mapping, and hibernate queries
- Involved in writing Hibernate mapping files (HBM files) and configuration files.
- Used Log4j for logging Errors.
- Using JUnit test, extensively written test cases for this system to test the application.
- Implemented logging mechanism using Log4j with the help of SpringAOP frame work.
- Server side validations using the StrutsValidator framework
Environment: Eclipse, NetBeans, NoSQL, PL/SQL developer, Filezilla, Putty, SOAP UI, Java, JavaScript, XML, HTML, JSP
Confidential
JavaDeveloper
Responsibilities:- Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
- Developed HTML and JSP to present Client side GUI.
- Involved in development of JavaScript code for client side Validations.
- Designed the HTMLbased web pages for displaying the reports.
- Developed the HTML based web pages for displaying the reports.
- Developed java classes and JSP files.
- Extensively used JSF framework.
- Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
- Developed dynamic content of presentation layer using JSP.
- Develop user-defined tags using XML.
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Developed JSP as the view, Servlets as Controller and EJB as model in the Struts Framework.
- Created and implemented PL/SQL stored procedures, triggers.
Environment: Java, J2EE, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1.