Hadoop Developer Resume
Coon Rapids, MN
PROFESSIONAL SUMMARY:
- Around 5+ years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
- Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
- Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
- Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub - pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
- Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
- Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
- Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
- Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
- Hands on experience in developing MapReduce programs according to the requirements
- Hands on experience in performing data cleaning, pre processing using Java and Talend data preperation tool
- Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
- Expertise with NoSQL databases such as Hbase. expertize in using Talend tool for ETL purposes (data migration, cleansing.)
- Expertise in working with different kind of data files such XML, JSON and Databases
- Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS using Sqoop
- Experience in working with different file formats and compression techniques in Hadoop
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
- Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
- Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
- Worked extensively on different Hadoop distributions like CDH and Hortonworks
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
- Expertise in extending Hive and Pig core functionalities by writing custom User Defined Functions (UDF)
- Proficient in using various IDEs like Eclipse , My Eclipse and NetBeans
- Hands on experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Sound knowledge in programming Spark using Scala
- Excellent Programming skills at a higher level of abstraction using Scala and Spark
- Good understanding in processing of real-time data using Spark
- Hands on experience in build management tool Maven and Ant
- Experience in all phases of software development life cycle
- Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
- Extensive experience in utilizing Agile methodologies for software development
- Followed agile methodology for development process
- Developed the system by following the agile methodology.
- Support development, testing, and operations teams during new system deployments
- Very good knowledge in SAP BI (ETL) and Data warehouse tools.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Involved in Hadoop testing.
- Hand on experience on UNIX scripting.
- Responsible for designing and implementing ETL process using Talend to load data from different sources.
- Experience with working of cloud configuration in Amazon web services AWS
- Good Knowledge in web Services
- Have done reporting by using SSRS, SAPBO, and Tableau.
- Knowledge on SSIS, SSRS, SSAS, SAP BI.
Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Scala, Spark and storm
Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting.
Web Services: MVC, SOAP, REST
RDBMS: Oracle 10g, MySQL, SQL server, DB2.
No SQL: HBase, Cassandra.
Data Bases: Oracle, Tera data, DB2, MS Azure.
ETL tools: Talend, SSIS
Tools: Used: Eclipse, Putty, Pentaho, MS Office, Crystal Reports, Falcon and Ranger
Development Strategies: Agile, Water-Fall and Test Driven
PROFESSIONAL EXPERIENCE:
Confidential, Coon Rapids, MN
Hadoop Developer
Responsibilities:
- Developed a 16-node cluster in designing the Data Lake with the Horton Works Distribution .
- Worked on building a channel from Sonic JMS, to consume messages from the Soap Application.
- Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
- Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
- Developed a Kafka producer which bring the data streams from JMS client and passes to the Kafka Consumer.
- Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS, until the Data lake or HDFS.
- Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
- Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest n to RAW Layer.
- Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
- Worked on Sequence files , Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
- Worked on handling different data formats including XML, JSON, ORC, Parquet.
- Strong Experience in Hadoop file system and Hive commands for data mappings.
- Worked in ingestion from RDBMS using Sqoop and Flume.
- Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from 3 layered BI Data Lake.
- Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
- Worked in a team to Kerberos Kafka to reduce risk of enterprise security and authorization issues.
- Worked on hive data modelling and delegated a separate data hub for Multiple Hadoop datasets.
- Transformed the existing ETL logic to Hadoop mappings.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Developed code in reading multiple data formats on HDFS using PySpark
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
- Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
- Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
- Published customized interactive reports and dashboards, report scheduling using Tableau server.
Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell ScriptingOracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.
Confidential, Edina, MN
Hadoop Developer
Responsibilities:
- Key role in team of 3 in Migrating the Existing RDBMS system to Hadoop.
- Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies to filter and process that data across multiple clusters for complex event processing
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipe line.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Responsible for processing ingested raw data using Kafka and Hive.
- Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Used Sqoop to import customer information data from SQL server database into HDFS for data processing.
- Loaded and transformed large sets of structured, semi structured data using Pig Scripts .
- Involved in loading data from UNIX file system to HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Worked on the core and Spark SQL modules of Spark extensively.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Had a couple of workshops on Spark, RDD & spark-streaming.
- Discussed the implementation level of concurring programing in spark using python with message passing.
Environment: Hadoop, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, spark, Horton works.
Confidential, Minneapolis, MN
Hadoop Developer
Responsibilities:
- Developed in writing MapReduce jobs.
- Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
- Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- aggregations.
- Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Developed Hive UDFs.
- Implemented Kafka for streaming data and filtered, processed the data.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
- Developed Shell scripts for scheduling and automating the job flow.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
- Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS
- Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari.
- Load balancing of ETL processes, database performance tuning ETL processing tools.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Optimized Hive queries to extract the customer information from HDFS or HBase .
- Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
- Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.
Confidential, Saint Paul, MN
Java Developer
Responsibilities:
- Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
- Involved in Design, Development and testing of the system.
- Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
- Developed User Defined Functions and created Views.
- Created Triggers to maintain the Referential Integrity.
- Implemented Exceptional Handling.
- Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
- Creating and automating the regular Jobs.
- Tuned and Optimized SQL Queries using Execution Plan and Profiler.
- Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
- Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used jQuery in web based applications
- Developed the controller component with Servlets and action classes.
- Business Components are developed (model components) using Enterprise Java Beans (EJB).
- Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
- Analyzing System Requirements and preparing System Design document
- Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
- Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
- Used JMS elements for sending and receiving messages
- Used hibernate for mapping from Java classes to database tables
- Created and executed Test Plans using Quality Center by Test Director
- Mapped requirements with the Test cases in the Quality Center
- Supporting System Test and User acceptance test
- Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
- Involved in performing database Backup and Recovery.
- Worked on Documentation using MS word.
Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML.