Senior Hadoop Developer Resume
TX
SUMMARY:
- 7+ years of experience in software development life cycle design, development and support of systems application architecture.
- 5+ Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Experience in working with Hadoop clusters using Cloudera (CDH3) distributions.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
- In depth knowledge and hands on experience in installing, configuring, monitoring and integration of Hadoop ecosystem components Hadoop (HDFS, MapReduce, Pig, Hive, Scoop, Flume, Hbase, Oozie).
- Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
- Extensively worked on MRV1 and MRV2 Hadoop architectures.
- Designing and creating Hive external tables using shared meta - store instead of derby with partitioning, dynamic partitioning and buckets.
- Exposure on Spark, Kafka and Scala Programming.
- Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
- Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Extensively used ETL methodology for supporting Data Extraction, transformations and load.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java and Python.
- Experience in writing with Map Reduce programs using Apache Hadoop for working with Big Data.
- Experience on NoSQL databases including HBase, Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extending Hive and Pig core functionality by writing customUDFs.
- Experience with Eclipse/ RSA.
- Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Excellent teamwork and communication skills, research-minded, technically competent and result-oriented with problem solving abilities.
TECHNICAL SKILLS:
Programming Languages: Core Java, J2EE, Scala, XML, DB2, CICS, SQL, PL/SQL, HiveQL, Pig Latin
Hadoop Eco System: HDFS YARN, MapReduce, Pig, Hive, Sqoop, Flume,zookeeper, Oozie, Apache Kafka and Kerberos
Hadoop Distributions: Cloudera, Hortonworks
Operating Systems: Linux, Unix, MVS, Windows
NonRelational Databases: MongoDB, Cassandra,HBase
Relational Databases: DB2 V 9.0, MySQL, Microsoft SQL Server
Scripting Languages: Python, Shell Scripting
Application/Web Servers: Apache Tomcat, JBoss, Websphere, MQ Series, Data power, Web services
Tools: Endeavour, Data Power XI150 Appliance, SoapUI, Jmeter, XML Harness, Labs testing tool
QA Tools: Quality Center
IDE: Intellij,Eclipse,Net Beans
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential, TX
Responsibilities:
- Understanding the existing environment to start up ETL process from High level documentation.
- Used the data Integration tool Pentaho for designing ETL jobs in the process of building Data warehouses and Data Marts.
- Participation in Performance tuning in database side, transformations, and jobs level.
- Created a design that jobs, transformations and load the data sequentially & parallel for initial and incremental loads.
- Used Pentaho Data Integration Designer to create ETL transformations
- Developed ETL transformations that sourced from a variety of Heterogeneous sources including Text files, Json files. worked with Operational cybersecurity which had a wide variety of data from multiple sources in massive volumes.
- Importing and exporting data into HDFS from database by using Etl.
- Written Etl jobs to parse the logs and insert into impala tables to facilitate effective querying on the log data
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into tables as parquet format.
- Involved in creating crontab to run multiple Jobs, which run independently with time and data availability.
- Involved in developing ETL jobs and automated data management from end to end integration work
- Developed Map Reduce program for parsing and loading into HDFS information.
- Worked on Qradar which collects log data from an enterprise,network devices,operating systems,applications and user activities and behaviours.
- Test visualization and reports for data accuracy and functionality
Environment: Hadoop, Spark,Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Qradar, Cloudera 5.8, Oozie, Impala, Spotfire, Eclipse, Pentaho, Oracle
Senior Hadoop Developer
Confidential, IL
Responsibilities:
- Developed data pipeline using Kafka, Flume, Sqoop, Pig and Spark to ingest customer behavioural data and financial histories into HDFS for analysis.
- Developed Turbocow framework for filtering data from raw to enrich with business rules.
- Framework contains different actions like lookups; replace null with zero and also simple copy for further uses.
- Involved in sqoop that to bring data from teradata into hdfs to do lookups with dimensional data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Used Intellij to build the application.
- Hands on experience in hadoop cluster 5.5 and 5.4.
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
- Responsible and managed entire Hive warehouse.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Involved in streaming data to ingest from Kafka cluster on json format and also teradata tables imported on periodic basis as batch jobs.
- Taken care of flume agents to ingest the ALS event data stream from Kafka to hdfs as compressed for batch processing with spark and also streams raw data to spark streaming.
- Used sqoop to import tables and also data from teradata to hdfs periodically.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Worked on impala performance tuning with different workloads and file formats.
Environment: Hadoop, Spark,Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Kafka, Cloudera 5.5, Oozie, Impala, Tableau, Eclipse, Intellij
Senior Hadoop Developer
Harley-Davidson, WI
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Involved in writing MapReduce jobs.
- Involved in SQOOP, HDFS Put or Copy from Local to ingest data.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing Hive DDLs to create, alter and drop Hive TABLES.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Used Eclipse and ant to build the application.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, MapReduce) and move the data files within and outside of HDFS.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java Cloudera HDFS, Eclipse
Senior Hadoop Developer
Confidential, MI
Responsibilities:
- Development and ETL Design in Hadoop
- Developed Mapreduce Input format to read visa specific data format.
- Performance tuning of Hive Queries written by data analysts.
- Developing Hive queries and udf’s as per requirement.
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Developed UDFs in Pig and Hive
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Worked with BI teams in generating the reports on Tableau
- Installed and configured various components of Hadoop ecosystem and maintained their integrity.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Upgraded Hadoop Versions using automation tools.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Migrating existing Ab initio transformation logic to Hadoop Pig Latin and Udf's.
- Used Sqoop to efficiently Transfer data from DB2 to HDFS, Oracle Exadata to HDFS
- Designing ETL flow for several newly on boarding Hadoop Applications.
- Worked on implementing Hadoop Streaming, Python mapreduce for visa analytics.
- Implemented NLine Input Format to split a single file into multiple small files.
- Designed and Developed oozie workflows, integration with Hcatalog/Pig.
- Documented ETL Best Practices to be implemented with Hadoop
- Managing and Supporting Infrastructure.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Hadoop Cloudera upgrade from CDH4.x to CDH5.x.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helping testing team to get up to speed on Hadoop Application testing.
- Worked on Integration of Hiveserver2 with Tableau.
- Worked on impala performance tuning with different workloads and file formats.
- Worked on Installing 20 node UAT Hadoop cluster.
- Worked on POC of Talend integration with hadoop
Environment: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, DB2, Oracle, XML, Cloudera Manager.
Hadoop Developer
Confidential, NY
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Developed and updated social media analytics dashboards on regular basis.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Create a complete processing engine, based on Cloudera distribution, enhanced to performance.
- Manage and review Hadoop log files.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Java, HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume, Oozie, Zookeeper, MySQL, and eclipse.
Java Developer
Confidential
Responsibilities:
- Develop GUI related changes using JSP, HTML and client validations using Java script.
- Designed and developed front end using HTML, JSP and Servlets
- Implemented client side validation using JavaScript
- Used Hibernate in persistence layer of the application
- Created UML class diagrams that depict the code’s design and its compliance with the functional requirements.
- Developed user interface using JSP to simplify the complexities of the application.
- Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
- Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code. Implemented the DAO pattern.
- Developed business logic using Stateless session beans for calculating asset depreciation on Straight line and written down value approaches.
- Involved coding SQL Queries, Stored Procedures and Triggers.
- Created java classes to communicate with database using JDBC.
Environment: Java 1.4, Servlets, JSP, EJB, J2EE 1.4, XML, XSLT, Java Script, SQL, PL/SQL, Ms Visio, Eclipse, JDBC, Win CVS, Windows XP.
Java Developer
Confidential
Responsibilities:
- Used Rational Rose for creating sequence and class diagrams.
- Developed presentation layer using JSP, Java, HTML and JavaScript.
- Participated in the design and development of database schema and Entity-Relationship diagrams of the backend Oracle database tables for the application.
- Designed and Developed Stored Procedures, Triggers in Oracle to cater the needs for the entire application. Developed complex SQL queries for extracting data from the database.
- Designed and built SOAP web service interfaces implemented in Java.
- Used Apache Ant for the build process.
Environment: Java, JDK 1.5, Servlets, Ajax, Oracle 10g, Eclipse, Apache Ant, Web Services (SOAP), Apache Axis, Apache Ant, Web Logic Server, JavaScript, HTML, CSS, XML.
