Senior Hadoop Developer/data Engineer Resume
Wausau, WI
SUMMARY:
- Around 7 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
- Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark, Spark SQL, Spark Streaming and Hive for scalability, distributed computing, and high performance computing.
- Experience in using Hive Query Language for data Analytics.
- Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
- Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager on Linux, Ubuntu OS etc.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Having Good knowledge on Single node and Multi node Cluster Configurations.
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Experience in creating Hive, Hbase tables and worked with Apache Ph+oenix to retrieve the data.
- Expertise on Scala Programming language and Spark Core
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Hybrid clouds deployment module namely private, public and community clouds and Knowledge of SaaS, PaaS & IaaS as service module, implemented Amazon cloud
- Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift.
- Analyze data, interpret /results, and convey findings in a concise and professional manner
- Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Knowledge of MS SQL Server 2012/ 2008/2005 and Oracle 11g/10g/9i and E-Business Suite.
- Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008 .
- Developed Web-Services module for integration using SOAP and REST.
- NoSQL database experience on HBase, Cassandra.
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
- Good experience on Kafka and Storm
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
- Java Developer with extensive experience on various Java Libraries, API's and frameworks.
- Hands on development experience with RDBMS, including writing complex SQLqueries, Stored procedure and triggers.
- Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
- Experience on using Talend ETL tool.
- Experience in working with job scheduler like Autosys and Maestro.
- Strong in databases like Sybase, DB2, Oracle, MS SQL.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
- Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
- Strong analytical and Problem solving skills.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Tez, Flume, Kafka, Pig, Hive, Presto, phoenix Oozie, impala, Ambari, Spark, Zookeeper and Cloudera Manager.
NO SQL Database: HBase, Cassandra
Monitoring and Reporting: Tableau, Custom shell scripts
Hadoop Distribution: Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Programming & Scripting: JAVA, SQL, Shell Scripting, Python, Scala
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services
Databases: Oracle, MY SQL, MS SQL server, Teradata
Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS
Version Control and Cloud: SVN, CVS, GIT, AWS, Azure
Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003
PROFESSIONAL EXPERIENCE:
Confidential, Wausau, WI
Senior Hadoop Developer/Data Engineer
Responsibilities:
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Worked on SQOOP to import data from various relational data sources.
- Worked on strategizing SQOOP jobs to parallelize data loads from source systems.
- Build hive tables with vendors generated flat files.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Good experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming and Hive.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration
- Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services).
- Developed Spark and Spark SQL scripts to migrate data from RDBMS into AWS-RedShift.
- Developed ETL Scripts for Data acquisition and Transformation using Talend.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Write test cases, analyze and reporting test results to product teams.
- Developed bash scripts to bring the log files from ftp server and then processing it to load into hive tables.
- Using Sqoop to load data from SQL into HBASE environment.
- Prepared adhoc phoenix queries on Hbase.
- Created secondary index tables using phoenix on HBase tables
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Created POC for NIFI to do sentimental analysis.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Responsible for creating Hive External tables and loaded the data in to tables and query data using HQL.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Environment: Hadoop Cluster, AWS, HDFS, Hive, Kafka, Pig, Sqoop, Presto, Linux, Yarn, Oozie, Hadoop Map Reduce, HBase, Shell Scripting, Ambari, Tez and Cassandra, Apache Spark.
Confidential, Houston, TX
Senior Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Create a POC using Presto for data Analytics.
- Write test cases, analyze and reporting test results to product teams.
- Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked in functional, system, and regression testing activities with agile methodology.
- Worked on Python plugin on MySQL workbench to upload CSV files.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Good experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming and Hive
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop, and responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
- Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
- Extract, Load and Transfer data through Talend.
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
- Extensive experience in using the MOM with Active MQ, Apache storm, Apache Spark & Kafka Maven and Zookeeper.
- Worked on the core and Spark SQL modules of Spark extensively.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Experience using Spark.
- Experience in writing batch processing huge Scala programs.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Responsible for creating Hive External tables and loaded the data in to tables and query data using HQL.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Zookeeper, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark.
Confidential, Lakeland, Florida
Senior Hadoop Developer
Responsibilities:
- Performed benchmarking of HDFS and Resource manager using TestDFSIO and Tera Sort.
- Worked on SQOOP to import data from various relational data sources.
- Working with Flume in bringing click stream data from front facing application logs
- Worked on strategizing SQOOP jobs to parallelize data loads from source systems
- Participated in providing inputs for design of the ingestion patterns.
- Design and execute a POC on ETL using Azure HDInsight and Talend.
- Participated in strategizing loads without impacting front facing applications.
- Worked on design on Hive data store to store the data from various data sources.
- Involved in brainstorming sessions for sizing the Hadoop cluster.
- Involved in providing inputs to analyst team for functional testing.
- Worked with source system load testing teams to perform loads while ingestion jobs are in progress.
- Worked on performing data standardization using PIG scripts.
- Worked on installation and configuration Horton works cluster ground up.
- Managed various groups for users with different queue configurations.
- Worked on building analytical data stores for data science team’s model development.
- Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
- Worked on performance tuning of HIVE queries with partitioning and bucketing process.
- Worked on the core and Spark SQL modules of Spark extensively
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting, Apache Spark.
Confidential, CA
Senior Hadoop Developer
Responsibilities:
- Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, Hbase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4).
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in installing Hadoop Ecosystem components.
- Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
- Responsible to manage data coming from different sources.
- Flume and from relational database management systems using SQOOP.
- Responsible to manage data coming from different data sources.
- Involved in gathering the requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Developed simple and complex MapReduce programs in Java for Data Analysis.
- Load data from various data sources into HDFS using Flume.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Worked on Hue interface for querying the data.
- Created Hive tables to store the processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
- Extensive knowledge on PIG scripts using bags and tuples.
- Experience in managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.
Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Developed Hive UDF’s to bring all the customers email id into a structured format.
- Developed bash scripts to bring the log files from ftp server and then processing it to load into hive tables.
- Using Sqoop to load data from DB2 into HBASE environment.
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Successfully designed and developed a solution for speeding up a SQL Job using Hadoop Map-Reduce framework. Processing time was reduced from hours to Minutes.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
- Responsible for performing extensive data validation using Hive.
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Hortonworks, DataStax, Flat files, Oracle 11g/10g, PL/SQL, SQL PLUS, UNIX Shell Scripting, Autosys r11.0.
Confidential, Chicago, IL
JAVA/J2EE Developer
Responsibilities:
- Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
- Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
- Involved in design and implementation of web tier using Servlets and JSP.
- Used Apache POI for Excel files reading.
- Developed the user interface using JSP and Java Script to view all online trading transactions.
- Designed and developed Data Access Objects (DAO) to access the database.
- Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
- Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
- Coded HTML pages using CSS for static content generation with JavaScript for validations.
- Used JDBC API to connect to the database and carry out database operations.
- Used JSP and JSTL Tag Libraries for developing User Interface components.
- Performing Code Reviews.
- Performed unit testing, system testing and integration testing.
- Involved in building and deployment of application in Linux environment.
Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.