Hadoop Developer Resume Coon Rapids, MN - Hire IT People

PROFESSIONAL SUMMARY:

Around 5+ years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub - pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
Hands on experience in developing MapReduce programs according to the requirements
Hands on experience in performing data cleaning, pre processing using Java and Talend data preperation tool
Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
Expertise with NoSQL databases such as Hbase. expertize in using Talend tool for ETL purposes (data migration, cleansing.)
Expertise in working with different kind of data files such XML, JSON and Databases
Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS using Sqoop
Experience in working with different file formats and compression techniques in Hadoop
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
Worked extensively on different Hadoop distributions like CDH and Hortonworks
Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
Expertise in extending Hive and Pig core functionalities by writing custom User Defined Functions (UDF)
Proficient in using various IDEs like Eclipse , My Eclipse and NetBeans
Hands on experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
Sound knowledge in programming Spark using Scala
Excellent Programming skills at a higher level of abstraction using Scala and Spark
Good understanding in processing of real-time data using Spark
Hands on experience in build management tool Maven and Ant
Experience in all phases of software development life cycle
Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
Extensive experience in utilizing Agile methodologies for software development
Followed agile methodology for development process
Developed the system by following the agile methodology.
Support development, testing, and operations teams during new system deployments
Very good knowledge in SAP BI (ETL) and Data warehouse tools.
Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
Involved in Hadoop testing.
Hand on experience on UNIX scripting.
Responsible for designing and implementing ETL process using Talend to load data from different sources.
Experience with working of cloud configuration in Amazon web services AWS
Good Knowledge in web Services
Have done reporting by using SSRS, SAPBO, and Tableau.
Knowledge on SSIS, SSRS, SSAS, SAP BI.

TECHNICAL SKILLS:

Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Scala, Spark and storm

Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting.

Web Services: MVC, SOAP, REST

RDBMS: Oracle 10g, MySQL, SQL server, DB2.

No SQL: HBase, Cassandra.

Data Bases: Oracle, Tera data, DB2, MS Azure.

ETL tools: Talend, SSIS

Tools: Used: Eclipse, Putty, Pentaho, MS Office, Crystal Reports, Falcon and Ranger

Development Strategies: Agile, Water-Fall and Test Driven

PROFESSIONAL EXPERIENCE:

Confidential, Coon Rapids, MN

Hadoop Developer

Responsibilities:

Developed a 16-node cluster in designing the Data Lake with the Horton Works Distribution .
Worked on building a channel from Sonic JMS, to consume messages from the Soap Application.
Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
Developed a Kafka producer which bring the data streams from JMS client and passes to the Kafka Consumer.
Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS, until the Data lake or HDFS.
Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
Responsible for the developing the parser which streams the Kafka streaming data segregates with hive tables, converts from XML to text file further ingest n to RAW Layer.
Designed, Developed and Delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
Worked on Sequence files , Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
Worked on handling different data formats including XML, JSON, ORC, Parquet.
Strong Experience in Hadoop file system and Hive commands for data mappings.
Worked in ingestion from RDBMS using Sqoop and Flume.
Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from 3 layered BI Data Lake.
Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
Worked in a team to Kerberos Kafka to reduce risk of enterprise security and authorization issues.
Worked on hive data modelling and delegated a separate data hub for Multiple Hadoop datasets.
Transformed the existing ETL logic to Hadoop mappings.
Extensive hands on experience in Hadoop file system commands for file handling operations.
Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Developed code in reading multiple data formats on HDFS using PySpark
Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
Published customized interactive reports and dashboards, report scheduling using Tableau server.

Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell ScriptingOracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.

Confidential, Edina, MN

Hadoop Developer

Responsibilities:

Key role in team of 3 in Migrating the Existing RDBMS system to Hadoop.
Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies to filter and process that data across multiple clusters for complex event processing
Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipe line.
Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
Responsible for processing ingested raw data using Kafka and Hive.
Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
Import the data from different sources like HDFS/HBase into Spark RDD.
Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Used Sqoop to import customer information data from SQL server database into HDFS for data processing.
Loaded and transformed large sets of structured, semi structured data using Pig Scripts .
Involved in loading data from UNIX file system to HDFS.
Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
Worked on the core and Spark SQL modules of Spark extensively.
Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
Had a couple of workshops on Spark, RDD & spark-streaming.
Discussed the implementation level of concurring programing in spark using python with message passing.

Environment: Hadoop, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, spark, Horton works.

Confidential, Minneapolis, MN

Hadoop Developer

Responsibilities:

Developed in writing MapReduce jobs.
Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- aggregations.
Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
Developed Hive UDFs.
Implemented Kafka for streaming data and filtered, processed the data.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
Developed Shell scripts for scheduling and automating the job flow.
Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS
Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari.
Load balancing of ETL processes, database performance tuning ETL processing tools.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Optimized Hive queries to extract the customer information from HDFS or HBase .
Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.

Confidential, Saint Paul, MN

Java Developer

Responsibilities:

Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
Involved in Design, Development and testing of the system.
Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
Developed User Defined Functions and created Views.
Created Triggers to maintain the Referential Integrity.
Implemented Exceptional Handling.
Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
Creating and automating the regular Jobs.
Tuned and Optimized SQL Queries using Execution Plan and Profiler.
Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used jQuery in web based applications
Developed the controller component with Servlets and action classes.
Business Components are developed (model components) using Enterprise Java Beans (EJB).
Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
Analyzing System Requirements and preparing System Design document
Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
Used JMS elements for sending and receiving messages
Used hibernate for mapping from Java classes to database tables
Created and executed Test Plans using Quality Center by Test Director
Mapped requirements with the Test cases in the Quality Center
Supporting System Test and User acceptance test
Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
Involved in performing database Backup and Recovery.
Worked on Documentation using MS word.

Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Coon Rapids, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship