Hadoop Developer Resume
Chicago, IL
SUMMARY:
- Over 8+ years of total professional experience in IT field involving project development, implementation, deployment and maintenance using Hadoop ecosystem related technologies with domain knowledge in Finance, Banking, Communication, Insurance, Retail Industry and Health care.
- 4+ years of hands on experience in Hadoop Ecosystem technologies like Confidential, MapReduce, Yarn, Spark, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase.
- More than 4+years of work experience in ingestion, storage, querying, processing and analysis of BigData with hands on experience in Hadoop Ecosystem development including Mapreduce,
- Confidential, Hive, Pig, Spark, ClouderaNavigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, and MapReduce concepts.
- Proficient knowledge on Apache Spark and Apache Storm to process real time data.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
- Extensive experience working in Teradata, Confidential, Netezza, SQLServer and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift which provides fast and efficient processing of Big Data.
- Azure HDInsight is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more. Azure HDInsight enables a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more.
- Worked on live 60 nodes Hadoop cluster running on Cloudera CDH4.
- Extensive experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Developed UDF, UDAF, UDTF functions for Hive and Pig.
- Good knowledge of Partitions, Bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
- Good experience in Avro files, RC files, Combiners, Counters for best practices and performance improvements.
- Good knowledge on Joins, group and aggregation concepts and resolved performance issues in Hive and Pig scripts by implementing them.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MSOffice,
- PLSQL Developer, SQL*Plus.
- Experience in different application servers like JBoss/Tomcat, WebLogic, IBM WebSphere.
- Experience in working with Onsite-Offshore model.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy, Hadoop Distributions Cloudera, MapR, Hortonworks, IBM BigInsights
Languages: Java, Scala, Python,Jruby, SQL, HTML, DHTML, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB andHBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB, Web Design Tools HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven,Gradle, IntelliJ, JUNITand log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Confidential
RDBMS: Teradata, Confidential MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R, SAS and MATLAB
ETL Tools: Tableau, Talend, Informatica and Ab initio
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center
- Interacted with product owners & DBA teams to design the project for ETL process.
- Develop Mappings and Workflows to generate staging files.
- Developed various transformations like Source Qualifier, Sorter transformation, Joiner transformation, Update Strategy, Lookup transformation, Expressions and Sequence Generator for loading the data into target table.
- Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues that impact reporting, business analysis or program execution.
- Experienced in excel for creating Data validation, Lookup and Pivot tables.
- Experienced in MDM for data standardizing the attributes and performing data cleansing, data consolidation and checking the quality of the data.
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Worked on extracting and enriching HBase data between multiple tables using joins in spark.
- Worked on writing APIs to load the processed data to HBase tables.
- Replaced the existing MapReduce programs into Spark application using Scala.
- Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
- Developed the Hive UDF's to handle data quality and create filtered datasets for further processing
- Experienced in writing Sqoop scripts to import data into Hive/ Confidential from RDBMS.
- Good knowledge on Kafka streams API for data transformation.
- Implemented logging framework - ELK stack (Elastic Search, LogStash & Kibana) on AWS.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed oozie workflow for scheduling & orchestrating the ETL process.
- Used Talend tool to create workflows for processing data from multiple source systems.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Developed Hive Queries to analyze the data in Confidential to identify issues and behavioral patterns.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing
- Deployed applications using Jenkins framework integrating Git- version control with it.
- Participated in production support on a regular basis to support the Analytics platform
- Used Rally for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Hadoop, Hbase, Confidential, AWS, PIG, Hive, Drill, SparkSql, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, HbaseTalend, Shell Scripting, Java.
Confidential, Richmond VA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Built on-premise data pipelines using kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill,
- Impala and Spark connectors.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- Handled importing data from different data sources into Confidential using Sqoop and performing transformations using Hive, Map Reduce and then loading data into Confidential .
- Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTP system to the Data warehouse and Report-Data mart.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in Confidential for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Worked on the ETL scripts and fixed the issues at the time of data load from various data sources
- Performed data analysis with HBase using Apache Phoenix.
- Supported existing BI solution, data marts and ETL processes.
- Exported the analyzed data to Impala to generate reports for the BI team.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract the name entities from OCR files.
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control
Environment: MapR, Cloudera, Hadoop, Confidential, AWS, PIG, Hive, Impala, Drill, SparkSql, OCRMapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, SparkScala, Hbase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.
Confidential
Hadoop Developer
Responsibilities:
- Analyzing the requirement to setup a cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented Storm topologies to pre-process data before moving into Confidential system.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into Confidential .
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
- Loaded and transformed large sets of structured, semi structured, and unstructured data with MapReduce, Hive and pig.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Implemented Python scripts for writing MapReduce programs using Hadoop Streaming.
- Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
- Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Implemented monitoring on all the NiFi flows to get notifications if there is no data flowing through the flow more than the specific time.
- Converted unstructured data to structured data by writing Spark code.
- Indexed documents using Apache Solr.
- Set up Solr Clouds for distributing indexing and search.
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Worked on MongoDB for distributed storage and processing.
- Designed and implemented Cassandra and associated RESTful web service.
- Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
- Worked on analyzing and examining customer behavioral data using Cassandra.
- Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into Confidential .
- Involved in cluster setup, monitoring, test benchmarks for results.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Involved in agile methodologies, daily scrum meetings, Sprint planning's.
Environment: Hadoop, Cloudera, Confidential, pig, Hive, Flume, Sqoop, NiFi, AWS Redshift, PythonSpark, Scala, MongoDB, Cassandra, Snowflake, Solr, ZooKeeper, MySQl, Talend, Shell ScriptingLinux Red Hat, Java.
Confidential
Hadoop Developer/Administrator
Responsibilities:
- Resource management of Hadoop Cluster including adding/removing cluster nodes for maintenance and capacity needs.
- Responsible for monitoring the Hadoop cluster using Zabbix/Nagios.
- Converting the existing relational database model to Hadoop ecosystem.
- Installed and configured Flume, Oozie on the Hadoop cluster.
- Managing, defining and scheduling Jobs on a Hadoop cluster.
- Generate datasets and load to HADOOP Ecosystem.
- Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Involved in review of functional and non-functional requirements.
- Implemented Frame works using Java and python to automate the ingestion flow.
- Responsible to manage data coming from different sources.
- Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Involved in loading data from UNIX file system and FTP to Confidential .
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using HiveQL.
- Developed data pipeline using Kafka and Storm to store data into Confidential .
- Created reporting views in Impala using Sentry policy files.
- Developed Hive queries to analyze the output data.
- Had to do the Cluster co-ordination services through Zoo Keeper.
- Collected the logs data from web servers and stored in to Confidential using Flume.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto Confidential .
- Implemented several Akka Actors which are responsible for loading of data into hive.
- Design and implement Spark jobs to support distributed data processing.
- Supported the existing MapReduce Programs those are running on the cluster.
- Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote Java code to format XML documents; upload them to Solr server for indexing.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica.
- Followed agile methodology for the entire project.
- Installed and configured Apache Hadoop, Hive and Pig environment.
Environment: Hadoop, Hortonworks, Confidential, pig, Hive, Flume, Sqoop, Ambari, Ranger, PythonAkka, Play framework, Informatica, Elastic search, Linux- Ubuntu, Solr.
Confidential
Data Analyst
Responsibilities:
- Acted as a liaison between the IT developers and Business stake holders and was instrumental in resolving conflicts between the management and technical teams.
- Worked with business users for requirement gathering, understanding intent and defining scope and am responsible for project status updates to Business users.
- Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues that impact reporting, business analysis or program execution.
- Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
- Involved in performance tuning of slowly running SQL queries and created indexes, constraints and rules on database objects for optimization.
- Developed functions, views and triggers for automation.
- Assisted in mining data from the SQL database that was used in several significant presentations.
- Assisted in offering support to other personnel who were required to access and analyze the SQL database.
- Worked on Python Modules and Packages.
- Hands-on experience in Python scripting, in web development using Django.
- Used Python scripts to update the content in the database and manipulate file
- Analyzed various backup compression tools available and made the recommendations.
- Performed data analysis and data profiling using complex SQL on various sources systems including Confidential and Teradata.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Extensive experience in Design and Implementation of PL/SQL Stored Procedures, Functions, Packages, Views, Cursors, Ref Cursors, Collections, Records, Object Types, Database Triggers, Exception Handling, Forms, Reports, Table Partitioning.
- Involved in writing T-SQL programming for implement Stored Procedures and Functions for different tasks.
- Responsible for creating Databases, Tables, Index, Unique/Check Constraints Views, Stored Procedures, Triggers, Rules.
- Optimized the performance of queries by modifying the existing index system and rebuilding indexes.
- Coordinated project activities between clients and internal groups and information technology, including project portfolio management and project pipeline planning and Worked in close collaboration with the Project Management Office and business users to gather, analyze and document the functional requirements for the project.
- Responsible for development of workflow analysis, requirement gathering, data governance, data management and data loading.
- Analyzing and documenting data flow from source systems managed the availability and quality
- Of Data.
- Root cause analysis of data discrepancies between different business system looking at Business rules, data model and provide the analysis to development/bug fix team.
- Hands on experience writing Queries, Stored Procedures, Functions, PL/SQL Packages and Triggers in Confidential and reports and scripts
- Evaluated existing practices of storing and handling important financial data for compliance and Ensured corporate compliance with all billing, credit standards and direct responsibility of accounts receivables and supervision of accounts payable.
- Hands on experience writing Queries, Stored Procedures, Functions, PL/SQL Packages and Triggers in Confidential and reports and scripts
- Have setup data governance touch points with key teams to ensure data issues were addressed promptly.
- Responsible for facilitating UAT (User Acceptance Testing), PPV (Post Production Validation) and maintaining Metadata and Data dictionary.
- Responsible for source data cleansing, analysis and reporting using pivot tables, formulas (v-lookup and others), data validation, conditional formatting, and graph and chart manipulation in Excel.
- Actively involved in data modeling for the QRM Mortgage Application migration to Teradata and developed the dimensional model.
- Experience in developing SQL*Loader control programs and PL/SQL validation scripts for validating data to load data from staging tables to production tables.
- Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
Environment: Agile, Teradata, Confidential 12c, SQL, PL/SQL, Unix Shell Scripts, Python2.7, MDX/DAX, SAS, PROC SQL, MS Office Tools, MS Project, Windows XP, MDX/DAX, MS Access, Pivot Tables