Hadoop Spark Developer Resume
Chicago, IL
PROFESSIONAL SUMMARY:
- Over 8 years of professional experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Java and Big Data Technologies working with Apache Hadoop Eco - components
- Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala.
- Extensive experience with Real Time Streaming applications using Kafka, Flume, Storm and Spark Streaming
- Expertise in Kerberos Security Implementation
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
- Hands on experience with AWS components like EC2, EMR and S3.
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive, Pig, HBase and SQOOP.
- Having Good Working Expertise on handling Multi Terabytes of structured and unstructured data on significantly big Cluster Environment
- In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce
- Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
- Good understanding and working experience on Hadoop Distributions like Cloudera.
- Expertise in data transformation & analysis using PIG, HIVE and SQOOP.
- Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
- Experience in implementation of complete Big Data solutions, including data acquisition, storage, transformation, and analysis
- Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement.
- Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
- Developed Spark SQL programs for handling different data sets for better performance.
- Experience in building web services using both SOAP and RESTful services in Java.
- Good hands of experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Apache Tomcat.
- Experience in analyzing large scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering.
- Experience in implementing Custom Practitioner’s and Combiners for effective data distributions.
- Having Experience in Loading Tuple shaped data into Pig and Generate Normal Data into Tuples.
- Having Experience Semi-Structured Data Processing (Xml and Json) in Hive.
- Experience in Writing HIVE UDFs, UDTF and UDAFs
- Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
- Good understanding in working with various compression techniques like Avro, Snappy, LZO
- Good understanding in configuring simple to complex work flows using Oozie.
- Good understanding of NoSQL databases like Hbase and MongoDB
- Experience in designing and developing data model based on Cassandra for data modeling.
- Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD), and VM Ware.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker
AWS Components: EC2, EMR, S3, RDS, Redshift, DynamoDB, Lambda, SNS, SQS
Languages and Technologies: Java, C, C++, XML, SQL, Shell Script, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Operating Systems: Linux, Windows, CentOS, Ubuntu, RHEL
MySQL, Oracle 11g/10g/9i, MS: SQL Server, HBase, Cassandra, Mongo dB
Tools: Winscp, Wireshark, JIRA, IBM Tivoli
Scripting Languages: PHP
Others: HTML, XML
WORK EXPERIENCE:
Confidential, Chicago, IL
Hadoop Spark Developer
Responsibilities:
- Responsible to manage data incoming government medical coming from different sources. Also Handled data coming from multiple sources flowing to multiple targets.
- Expert in using IBM masking tool, which is used to mask the data according to the business requirement.
- Masked million records of USA government data.
- Implemented Hadoop data pipeline to identify customer behavioral patterns, improving UX on e-commerce website
- Develop MapReduce jobs in Java for log analysis, analytics, and data cleaning
- Perform big data processing using Hadoop, MapReduce, Sqoop, Oozie, and Impala
- Import data from MySQL to HDFS, using Sqoop to load data
- Developed and designed a 10-node Hadoop cluster for sample data analysis
- Regularly tune performance of Hive and Pig queries to improve data processing and retrieving
- Run Hadoop streaming jobs to process terabytes of XML data
- Create visualizations and reports for the business intelligence team, using Tableau
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Created Hive tables and load the data using Sqoop and worked on them using Hive QL
- Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
- Optimizing the Hive Queries using the various files format like JSON, AVRO, ORC, and Parquet
- Migrated existing MapReduce programs to Spark using Scala and Python
- Implemented Hadoop data pipeline to identify customer behavioral patterns, improving UX on e-commerce website
- Develop MapReduce jobs in Java for log analysis, analytics, and data cleaning
- Perform big data processing using Hadoop, MapReduce, Sqoop, Oozie, and Impala
- Import data from MySQL to HDFS, using Sqoop to load data
- Developed and designed a 10-node Hadoop cluster for sample data analysis
- Regularly tune performance of Hive and Pig queries to improve data processing and retrieving
- Run Hadoop streaming jobs to process terabytes of XML data
- Create visualizations and reports for the business intelligence team, using Tableau
- Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
- Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format
- Processed application Weblogs using flume and load them into Hive for analyzing the data
- Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
- Generated detailed design documentation for the source-to-target transformations.
- Involved in planning process of iterations under the Agile Scrum methodology.
- Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Developed Merge jobs in Python to extract and load data from MySQL database to HDFS
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore.
Confidential, New York City, NY
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote MapReduce job using Pig Latin.
- Responsible for Loading the Customers Data from SAS to MSSQL 2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop
- Written PIG scripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants
- Responsible for writing Lucene search program for high-performance, full-featured text search of Merchants
- Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming
- Written Java UDFs to convert to upper case of card names & process dates to suitable format in PIG & Hive
- Responsible for build the Docker containers and scheduled the Oozie workflows to run the above 3 sprints
- Responsible for creating data pipeline using Kafka, Flume and Spark Streaming for Twitter source to collect the sentiment tweets of Target customers about the reviews
- Implemented Kerberos security Implementation
- Created users in Active Directory and map the roles in each group for the users in Apache Sentry
- Experience in using Java API to load the data into Cassandra clusters.
- Involved in LDAP implementation for different types of accesses in AD for Hue, Hive, Pig
- Complete caring of Hive and Spark tuning with partitioning/bucketing of ORC and executors/drivers memory
- Written Hive UDFs to extract data from staging tables
- Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components under Cloudera distribution.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Hands on experience with Apache spark using Scala. Implemented spark solutions to enable real time report from Cassandra data.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Created Hive tables and working on them using Hive QL.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH5,Map R, MapReduce, HDFS, Hive, Pig, Kerberos, Apache Sentry, HBase, Sqoop, Spark , Oozie Linux, XML, MySQL, MySQL Workbench, PL/SQL, SQL connector
Confidential, CA
Hadoop Developer
Responsibilities:
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particularday, during a dumped from the multiple sources and the output was written back to HDFS
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- Involved in initiating and successfully completing Proof of Concept on SQOOP for Pre-Processing, Increased Reliability and Ease of Scalability over traditional Oracle database.
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Developed Hive queries for the analysts.
- Cluster co-ordination services through Zookeeper.
- Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
- Analyze the log files and process through Flume
- Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
Environment: UNIX Scripting, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse
Confidential
Sr. Java Developer
Responsibilities:
- Analyzed requirements and created detailed Technical Design Document and Functional specification and reviewing changes and XML is used to create data transfer logic from other formats to XML file for billing module.
- Involved in development of Java, JSP, Servlets components development and deployment.
- Worked on Oracle database to design Databases Schema, created Database structure, Tables and Relationship diagrams.
- Handled overall exception handling and logging for the application. Created the style sheet and XSLT for presentation layer controls. Developed end to end activities presentation layer, Service layer, data access layer and database activities for the modules assigned.
- Performed enhancements to existing SOAP web services for online card payments
- Performed enhancements to existing payment screens by developing servlets and JSP Pages
- Involved in end to end batch loading process using ETL Informatica
- Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
- Developed Validation Layer providing Validator classes for input validation, pattern validation and access control
- Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Involved in creation of Test Cases for JUnit Testing and carried out Unit testing.
- Used ClearCase as configuration management tool for code versioning and release deployment on Oracle WebSphere Server 7.0
- Used MAVEN tool for deployment of the web application on the WebSphere Server.
- Interacted with business team to transform requirements into technical solutions.
- Responsible for writing SQL Queries, Trigger and function to manipulate database objects.
- Developed the user interface for the application using HTML, DHTML and Java Server Pages (JSP).
- Developed user interface using JSP, Servlets JSTL, Struts and Java Script and created the presentation layer and business logic layer. Was responsible for client-side validation and serverside validation for the modules.
- Developed internal testing tool, using JSP to place test XML orders to framework application.
- Acted as a single point contact for the Database related activities like developing/creating tables, procedures and functions for the java developers. Involved in designing of the application using UML (Unified Modeling Language).
- Developed complex wed based screens in JSP with embedded Tag Libraries and Struts based tags/classes of Action, Action From, Action Mapping and Action Errors etc.
- Created Java Script to manipulate DOM objects and developed Unit test cases to test service and server components and provided production support in fixing production issues and fixing bugs.
- Developing various reports to stakeholders on weekly, monthly and adhoc basis
- Maintained the code base using Microsoft VSS and acted as a librarian to integrated and check in all the peers code to the repository.
Environment: Java, JSP, Servlets, Web Sphere Application Server, Eclipse, Struts, Java Script, Clear Case, Maven, Oracle, PL/SQL and JDBC.
Confidential
Java Developer
Responsibilities:
- Enhancement of the System according to the customer requirements.
- Developed Servlets and Java Server Pages (JSP) for Presentation Layer
- Written Procedures, functions and triggers as part of accessing DB layer
- Developed PL SQL queries to generate reports based on client requirements.
- Created test cases scenarios for Functional Testing.
- Used HTML, Java Script validation in JSP pages.
- Helped design the database tables for optimal storage of data.
- Coded JDBC calls in the Servlets to access the Oracle database tables.
- Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
- Prepared final guideline document that would serve as a tutorial for the users of this application
Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, SOAP, Oracle 10g, PL SQL, HTML, JSP, Eclipse, UNIX