Big Data / Hadoop Developer Resume
Princeton, NJ
SUMMARY:
- 7+ years of IT experience in software development, big data management, data modeling, data integration, implementation and testing of enterprise class systems spanning big data frameworks, advanced analytics and Java/J2EE technologies.
- 3+ years of hands on experience in Hadoop components & Map Reduce programming for parsing and populating tables for Terabytes of data.
- Extensive usage of Sqoop, Flume, Oozie for data ingestion into HDFS & Hive warehouse.
- Experienced on major Hadoop ecosystem’s projects such as Pig, Hive and HBase.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Hands on performance improvement techniques for data processing in Hive, Impala, Spark, Pig & map-reduce using methods including but not limited to dynamic partitioning, bucketing, file compression.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka.
- Expertise of ingesting data to Solr from HBase
- Extensively worked on debugging using Eclipse debugger.
- Experienced in importing data from various sources using StreamSets.
- Experience with Cloudera, Hortonworks & MapR Hadoop distributions.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Strong problem-solving skills, effective communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
- A pleasing personality with the ability to build great rapport with clients and customers.
- Illustrates excellent verbal and written communication, paired with great presentation and interpersonal skills.
- Portrays strong leadership qualities, backed with a great track record as a team player.
- Adept with the latest business/technological trends.
CORE COMPETENCIES
- Hadoop Development & Troubleshooting
- Data Analysis
- Data Visualization & Reporting in Tableau
- Real-time Streaming using Spark.
- Map Reduce Programming
- Performance Tuning of Hive & Impala
- Ingesting data from HBase to Solr
- Data import using StreamSets.
TECHINICAL SKILLS
Hadoop Ecosystems:: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Spark and Zookeeper, Solr, StreamSets.
Apache Spark:: Spark, Spark SQL, Spark Streaming, Scala.
ETL Tools:: Informatica with Hadoop connector, Pentaho, Alteryx
Scripting Languages:: Java, C, Scala, SQL, Unix Shell Scripting, Python
Java Technologies:: JQuery, JSP, Servlets.
SQL Databases:: Oracle, SQL Server 2012, SQL Server 2008 R2, DB2, Teradata
NoSQL:: MongoDB, HBase.
Development tools:: Maven, Eclipse, IntelliJ, PyCharm
PROFESSIONAL EXPERIENCE:
Confidential, Princeton, NJ
Big data / Hadoop Developer
Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
- Developed MapReduce programs that filter bad and unnecessary claim records and find out unique records based on account type
- Processed semi, unstructured data using Map Reduce programs
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and preprocessing with Pig using Oozie coordinator jobs
- Implemented custom Datatypes, Input Format, Record Reader, Output Format, Record Writer for MapReduce computations
- Worked on CDH4 cluster on CentOS.
- Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level
- Transformed date related data into application compatible format by developing apache Pig UDFs
- Developed MapReduce pipeline for feature extraction and tested the modules using MRUnit
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms
- Creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way
- Responsible for performing extensive data validation using Hive
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access
- Worked on different set of tables like External Tables and Managed Tables
- Used Oozie workflow engine to run multiple Hive and Pig jobs
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in designing and developing nontrivial ETL processes within Hadoop using tools like Pig, Sqoop, Flume, and Oozie
- Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML
- Used DML statements to perform different operations on Hive Tables
- Developed Hive queries for creating foundation tables from stage data
- Used Pig as ETL tool to do transformations, event joins, filter and some preaggregations
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping
- Developed Mapping document for reporting tools
Environment: C, Java, Cassandra, Shell Scripting, Apache Hadoop, HDFS, MapReduce, Java (jdk1.6), MySQL, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig
Confidential, Camp Hill, PA
Big Data/Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
- Configured Zoo Keeper, Cassandra& Flume to the existing Hadoop cluster.
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Performed analysis on the unused user navigation data by loading into HDFS and writing Map Reduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
- Worked with Cassandra for non-relational data storage and retrieval on enterprise use cases.
- Wrote Map Reduce jobs using Java API and Pig Latin.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Used Hive to do analysis on the data and identify different correlations.
- Involved in HDFS maintenance and administering it through Hadoop-Java API.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hive metadata.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Automatedall the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Involved in creating Hive tables and working on them using Hive QL.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Supported Map Reduce Programs those are running on the cluster.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop, Map Reduce, HDFS, Flume, Pig, Hive, HBase, Sqoop, ZooKeeper, Cloudera, Oozie, MongoDB, Cassandra, SQL*PLUS, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Developed data pipeline using Sqoop, Flume to store data into HDFS and further processing through spark.
- Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
- Wrote Python script to convert Autosys jobs, HDFS directory location paths from old standards to new standards.
- Wrote Python scripts for getting yarn job list for performance metrics.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Experience in customizing map reduce framework at different levels like input formats, data types, custom serde and partitioners.
- Pushed the data to Windows mount location for Tableau to import it for reporting.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming
- Worked on joins to create Hive look up tables.
- Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
- Analyzed large data sets by running Hive queries scripts.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed HIVE scripts passing dynamic parameters using hivevar.
- Created partitioned tables in Hive for best performance and faster querying.
- Configured build scripts for multi module projects with Maven.
- Automated the process of scheduling workflow using Oozie and Autosys.
- Prepared Unit test cases and performed unit testing.
- Created external table and partitioned tables in hive for querying purpose.
Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, Tableau, MySql.
Confidential, Dallas, TX
JAVA Developer
Responsibilities:
- Performed Code Reviews and responsible for Design, Code and Test signoff.
- Assisting the team in development, clarifying on design issues and fixing the issues.
- Involved in designing test plans, test cases and overall Unit and Integration testing of system.
- Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
- Developed Web Services using JAX-RPC, JAXP, WSDL, JSON, SOAP, RESTful, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
- Created CRUD applications using Groovy/Grails
- Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
- Writing test cases using JUNIT, doing test first development.
- Used Rational Clear Case & PVCS for source control. Also used Clear Quest for defect management.
- Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
- Running the nightly builds to deploy the application on different servers.
Environment: EJB, Webservices, Hibernate, Struts, JSP, JMS, JNDI, JDBC, Weblogic, SQL, PL/SQL, Oracle, Sybase, XML, XSLT, WSDL, SOAP, RESTful, GRAILS, UML, Rational Rose, Weblogic Workshop, OptimizeIt, Ant, JUnit, ClearCase, PVCS, ClearQuest, Win XP, Linux.
