Hadoop/big Data Developer Resume
Princeton, NJ
OBJECTIVE
- Experienced Hadoop Developer with Masters in Computer Science seeking position as Hadoop Developer with an esteemed organization which enables use of exceptional knowledge in Java, Spark, Hive, Pig, Python, Scala and HBase to improve analytic systems and collaborate on program development.
SUMMARY
- Around 6.4 years of extensive IT experience along with deep knowledge in Big Data/Hadoop Ecosystem.
- Expertise in setting up Hadoop Stand alone and Multi - node cluster.
- Core understanding of Hadoop main modules such as Hadoop Common, HDFS, MapReduce, YARN, Job Tracker, Task Tracker, Name Node and Data Node.
- Analyzed big data using R Programming.
- Experienced in using Oozie to automate data loading into the HDFS and PIG to pre-process the data.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW (Enterprise Data Warehouse).
- Imported data from MY SQL to Hive using SQOOP and used IMPALA for optimizing performance.
- Experience in importing and exporting structured data using Sqoop and Unstructured/Semi-Structured Data using Flume to Relational Database Systems/ Non-Relational Database Systems and vice-versa and analyzed and processed/queried data using Hive Query Language and Pig Scripting language.
- Proficient knowledge on Storage and Processing in Hue covering all Hadoop ecosystem components.
- Hands on experience with batch/Real-time data Streaming using Kafka & Spark into HDFS.
- Experience in loading the data into HDFS using PySpark and managed the dependencies to make them available for PySpark jobs.
- Expertise on using Tableau Reporting Tools.
- Designed & Developed the ETL jobs using various transformations as per the business requirements / ETL Mapping Specifications.
- Experience in all Phases of Software Development Life Cycle using Waterfall and Agile.
- Developed wireframes, mockups and use cases, flow charts, working prototypes and UI designs for a Pharmacy Management System along with the backend.
- Excellent verbal, written and interpersonal skills with analytical problem-solving capabilities and excellent Team management skills.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, MapReduce, HDFS, Yarn, HBase, Zookeeper, Hive, Pig, Sqoop, Impala, Oozie, Flume, Solr, Oozie, Spark with Scala/Python, Kafka
Programming Languages: R, Java, C/C++, Scala
Scripting Languages: PHP, JavaScript, XML, HTML, Python and Bash
Databases: NoSQL (HBase and Mongo Db), Oracle, Hive, Impala
Reporting Tools: Tableau
Platforms: Windows, Linux
Methodologies: Agile, UML, Design Patterns
Spark Technologies: Spark Core, Spark SQL, Spark Streaming
Business Process Tools: MS Excel, MS Word, MS Power Point, MS Outlook, MS Visio
PROFESSIONAL EXPERIENCE
Confidential, PRINCETON, NJ
Hadoop/Big Data Developer
Responsibilities:
- Working on stories related to Ingestion, Transformation, and Publication of data on time.
- Experience programming in a Unix/Linux environment, including Python and shell scripting.
- Supervise complex data workflows based on data integration between Kafka, Apache Spark, Hbase, Hive and other similar systems
- Using Impala shell and Hive shell for transformations in shell script.
- Involved in creating workflow to run multiple hive Jobs, pyspark jobs, shell scripts which run independently with time and data availability.
- Developed Scala/Python scripts using Data frames/Spark SQL and RDD in Spark for Data Aggregation, queries.
- Developed Spark code in Scala/Python and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself
- Developed Oozie workflow jobs to execute hive, sqoop and spark actions.
- Developed and maintained ETL mappings to extract the data from multiple source systems.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Extensively worked in code reviews and code remediations to meet the coding standards.
- Used SFTP to transfer and receive the files from various upstream and downstream systems.
- Extracted meaningful data from csv files, text files, and mainframe files and generated reports for data analysis.
- Utilized Python to run scripts, generate tables, and reports.
- Actively updated the upper management with daily updates on the progress of project.
- Coordinates with Agile team to effectively meet all commitments and project requirements.
Environment: Unix Shell scripting, HDFS, Sqoop, Hive, Hbase, Impala, Spark, Spark-SQL, Spark Streaming, Python, Scala, Solr, Kafka, Flume, Oozie, Airflow, SYNCSORT DMX-h, Cloudera, Agile.
Confidential
Hadoop Developer/Graduate Assistant/Masters
Responsibilities:
- Installed and configured Hadoop clusters for application development and Hadoop tools like Hive, Pig, Sqoop, HBase, Flume and Zookeeper.
- Involve in Requirement Analysis, Design, and Development.
- Worked on developing ETL processes to load data into HDFS using Sqoop and export the results back to MYSQL.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Involved in creating hive tables, loading the data and write hive queries that will run internally in a map reduce way.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Involved in working with Impala for data retrieval process.
- Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, MapReduce, HDFS, Sqoop, Impala, Tableau, Flume, Oozie, Linux.
Confidential
Hadoop Developer / Team Leader
Responsibilities:
- Developed several advanced Map Reduce programs to process data files receive.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
- Experience in implementing joins in the analysis of dataset to discover interesting relationships.
- Completely involved in the requirement analysis phase.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
- Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
- Integrated data quality plans as a part of ETL processes.
- Developed and tested extraction, transformation, and load (ETL) processes.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
- Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in managing and reviewing Hadoop log files.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.
Confidential
Junior Developer
Responsibilities:
- Excellent communication skills with great leadership qualities.
- Analyzed Object Oriented Design and presented with UML Sequence, Class Diagrams.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed components using Java multithreading concept.
- Developed various EJBs (session and entity beans) for handling business logic and data manipulations from database.
- Involved in design of JSP & and Servlets for navigation among the modules.
- Designed cascading style sheets and XSLT and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Hosted the application on Web Sphere.
Environment: J2EE, Java/JDK, PL/SQL, JDBC, JSP, Servlets, JavaScript, EJB, JavaBeans, UML, XML, XSLT, Oracle9i, HTML/DHTML, UML, JavaScript
Confidential
Junior Java Developer
Responsibilities:
- Quality/process improvement -- Participate in group improvement activities and initiatives to improve process and product quality in pursuit of excellence
- Working with developers in other component teams to ensure consistent integration of services across teams
- Core Java developer works independently and part of a team to support/enhance Securities Lending Application
- Design and develop code artifacts, create release notes and support documentation
- Experience in analysis, design, development, testing and implementation of Web Based application.
- Have good technical skills profound ability to analyze and formulate results.
- Proficiency in Software Design and development in different phase of Software Development Life Cycle.
- Hands on expertise in developing Windows Applications and on databases like MS SQL Server 2005/2008.
- Excellent communication skills with great leadership qualities.
Environment: Java, Java Script, Oracle, SQL, Windows.