Sr. Hadoop Developer Resume
Phoenix, AZ
PROFESSIONAL SUMMARY:
- 8+ years of experience as a software professional that starts from requirement gathering, analysis, design, implementation & testing software products using Java/ J2EE technologies and in Big data technologies using Hadoop ecosystem.
- Over 4 years of experience in working with different Hadoop ecosystem components such as HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, and Flume.
- Good experience in creating data ingestion pipelines, data transformations, data management and data governance, real time steaming engines at an Enterprise level.
- Expertize in java Application Development, Client/Server Applications using core java, J2EE technology, Web Services, REST Services, Oracle, SQL Server and other relational databases.
- Involved in creating analytical models that could be used for Recommendations, Risk Modeling, Fraud Detection and Prevention, Sentimental Analysis, Click Stream Analysis etc.,
- Very good experience in real time data streaming solutions using Apache Spark/Spark streaming (Spark SQL, Spark Streaming, MLlib, GraphX), Apache storm, Kafka and flume.
- Very good knowledge on usage of various big data ingestion techniques using Sqoop, Flume, Kafka, Native HDFS java API, REST API, HttpFS and WebHDFS.
- Worked on maintaining cluster such as Troubleshooting, managing and reviewing the performance related configuration fine tuning.
- Good experience in Splunk architecture and various components (indexer, forwarder, search head, deployment server), Heavy and Universal forwarder, License model.
- Experience in working with various Hadoop distributions like Cloudera, Hortonworks and MapR.
- Good experience in implementing end to end Data Security and Governance within Hadoop platform using Apache Knox, Apache Sentry, Kerberos etc.,
- Experience with different NoSQL data bases like HBase, Accumulo, Cassandra, MongoDB.
- Worked with different file formats like AVRO, ORC, Parquet while moving data into and out of HDFS.
- Experience with Apache Phoenix to access the data stored in HBase.
- Good experience in Designing, Planning, Administering, Installation, Configuring, Troubleshooting, Performance monitoring and Fine - tuning of Cassandra Cluster.
- Excellent knowledge on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Worked with Amazon Web Services (AWS) EC2 and S3, EMR, RedShift, Dynamo DB.
- Experience in software component design technologies like UML Design, Use case and requirement Components diagrams.
- Experience in Data mining and business Intelligence tools such as Tableau, Qlikview and Microstratergy.
- Experience in automating tasks with Python Scripting and Shell Scripting.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart. Well versed with Star-Schema & Snowflake schemas for designing the Data Marts.
- Developed ETL Scripts for Data acquisition and Transformation using Informatica and Talend.
- Good experience and understanding of Enterprise Data warehouse(EDW) architecture and possess End to End knowledge of EDW functioning.
- Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
- Strong knowledge of System Testing, User Acceptance testing and software quality assurance best practices and methodologies
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Excellent communication and inter-personal skills with technical competency and ability to quickly learn new technologies as required.
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving techniques and leadership skills.
TECHNICAL SKILLS:
Hadoop/Big Data: MapReduce, HDFS, Hive, Pig, HBase, Sqoop, Spark, Spark SQL, Spark Streaming, Kafka, Flume, Storm, Zookeeper, Phoenix, Oozie, Impala, Hue, Cloudera manager, Ambari
Distributed plat forms: Cloudera, Hortonworks, MapR
Programming Languages: C, C++, C#, Java, Scala, Python, R
Java/J2EE Technologies: Servlets, JSP, JSF, JDBC, Java Beans, RMI & Web services (SOAP, RESTful)
Frameworks: Struts, Hibernate and Spring MVC
Development Tools: Eclipse, Net Beans, SBT, ANT, Maven, Jenkins, Bamboo, SOAP UI, QC, Selenium WebDriver, Jira, Bugzilla, SQL Developer, Splunk, Talend, Informatica
Methodologies: Agile/Scrum, UML, and Waterfall
NoSQL Technologies: Cassandra, MongoDB, Neo4j, HBase, Accumulo, Dynamo DB
Databases: Oracle 12c, MySQL, MS-SQL Server, SQL Server, PostgreSQL
Web/ Application Servers: Apache Tomcat, WebLogic, WebSphere
Version Control: Git, SVN
Visualization: Tableau, Microstratergy and Qlikview
Web Technologies: HTML, CSS, XML, JavaScript, jQuery, AngularJS, Node JS, AJAX, SOAP, and REST
Scripting Languages: Unix Shell Scripting, Perl
Operating Systems: Windows XP/Vista/7/8,10, UNIX, LINUX
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Sr. Hadoop Developer
Responsibilities:
- Created various Spark applications using Scala to perform various enrichment of these click stream data with enterprise data of the users
- Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
- Implemented batch processing of jobs using Spark Scala API.
- Used Spark SQL and Data Frame API extensively to build spark applications.
- Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
- Closely worked with data science team in building Spark MLlib applications to build various predictive models.
- Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
- Migrating existing on premise applications and services to AWS.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce(EMR) to run a Map-reduce.
- Developed Sqoop scripts to import/export data from RDBMS to HDFS and Hive tables and vice versa.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
- Stored the data in tabular formats using Hive tables and Hive SerDe's.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Worked on implement Hadoop streaming through Apache Kafka and Spark.
- Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Involved building and managing NoSQL Database like HBase or Cassandra.
- Worked in Spark to read the data from Hive and write it to Cassandra using Java.
- Involved in developing Pig scripts/Pig UDF and to store unstructured data into HDFS.
- Involved in designing various stages of migrating data from RDBMS to Cassandra.
- Developed Shell Scripts and Python Programs to automate tasks.
- Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
- Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
- Loaded the final processed data to HBase tables to allow downstream application team to build rich and data driven applications.
- Experience in writing Phoenix queries on top of HBase tables to boost query.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Created partitioned tables and loaded data using both static partition and dynamic partition methods.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Involved in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
- Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON files and sequence files for log files.
- Having daily scrum calls on status of the deliverables with business user/stakeholders, client and drive periodic review meetings.
- Involved in setting up the QA environment and written unit test cases using MRUnit.
Environment: MapR Distribution, Cassandra 2.1, HDFS, Map Reduce, Hive, Spark, Kafka, Sqoop, Pig, HBase, Oozie, Scala, Java, Eclipse, Shell Scripts, Oracle 10g, Windows, Linux, AWS.
Confidential, Irving, TX
Hadoop Developer
Responsibilities:
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA .
- Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Worked in the transition team which primarily worked on migration of Informatica to Hadoop .
- Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase .
- Used Apache Phoenix to access the data stored in HBase.
- Performed unit testing using MRUnit.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/reduce Jobs using Hive and Pig.
- Worked on Real Time/Near Real Time data processing using Flume and Storm.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms like Gzip, SNAPPY, LZO etc.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Implemented Pig scripts integrated them into Oozie workflows and performed integrated testing.
- Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language ( HQL ).
- Developed HIVE and Pig queries and provided support for data analysts.
- Extensively worked on data ingestion between heterogeneous RDBMS systems and HDFS using Sqoop.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Exported the result set from Hive to MySQL using Shell scripts.
Environment: Cloudera Distribution, Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, Informatica 9.5, DB2, HBase, PL/SQL, SQL, Shell Scripting.
Confidential, Adrian, MI
Java/Hadoop Developer
Responsibilities:
- Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first-mover advantages by using Oozie to automate data loading into the HDFS and PIG to pre-process the data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL/Cassandra technology
- Refactored Cassandra-access code, to allow either Hector or Thrift access, replacing the original thrift code interspersed throughout the application
- Application performance optimization for a Cassandra cluster.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Identify concurrent job workloads that may impact or be impacted by failures or bottlenecks.
- Developed some utility helper classes to get data from HBase tables.
- Professional experience with HBase solutions to solve real world scaling problems.
- Implemented test scripts using MRUnit to support test driven development and continuous integration.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Also, be the part of triage call to handle defect reported by tester team or QA team.
- Viewing various aspect of a cluster using Cloudera Manager.
Environment: Hadoop, Linux, CDH, MapReduce, HDFS, Hive, Pig, Shell Scripting, Sqoop, flume, Java 7, MySQL, HBase, Oozie, Splunk, Eclipse, Oracle 11g, Maven, Log4j, Git.
Ericsson
Sr. Java Developer
Responsibilities:
- Involved in the Analysis, Design, Development, and Testing phases of Software Development Lifecycle (SDLC).
- Developed user interface using JSP, JavaScript, CSS and HTML.
- Implemented AJAX to allow dynamic loading, improved interaction and rich look to the User Interface for admin portal.
- Implementation of J2EE Design Patterns like Singleton, Session Facade and Data Access Objects.
- Used Hibernate for Object Relation Database Mapping Java classes.
- Used Spring 3.0 with JMS to establish interactive communication between different domains.
- Designed and developed a web-based client using Servlets, JSP, Java Script, Tag Libraries, CSS, HTML and XML.
- Designed Java classes using Spring Framework to implement the Model View Control (MVC) architecture.
- Good Experience in consuming and exposing SOAP and Restful Web services.
- Wrote complex SQL queries and programmed stored procedures, packages and triggers using Oracle 10g.
- Performed Module and Unit Level Testing with JUnit and Log4j.
- Used JBoss 6.0 as the application server.
Environment: Java 1.5, JDBC, Rest API, Hibernate 3, Spring 3, Servlets, JSPs, XML, XSLT, HTML, MXML, JavaScript, Maven, CVS, Log4j, JUnit, PL/SQL, Oracle 9i, Jboss 6, Eclipse IDE.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in back-end and front-end developing team. Took part in developing, maintaining, reviewing and supporting quality code and services.
- Involved in Daily SCRUM meetings and weekly SPRINT Meetings.
- Developed these web applications using J2EE technologies like Java Server pages JSPs, Servlets and Struts1.2 frameworks.
- Developed the application using Struts framework.
- Implemented Action Classes, Action Forms Struts Tag libraries using Struts framework.
- Defined and used XML schemas to define web service messages and used in WSDL.
- Designing and developing of User Interfaces using JSP, HTML, and JavaScript.
- Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
- Extracting, manipulating and updating the Oracle10g databases.
- Extensively used IDE Eclipse Indigo 3.7, Sub-Versioning SVN system for developing Java based Applications.
- Had good experience in deploying web applications on Tomcat 5.0 Web server and writing XML based Apache Ant 1.x scripts.
- Involved in writing test cases for Unit testing JUnit 4.2, Module Testing and Integration Testing.
Environment: Java, JavaScript, HTML, JSP, Servlets, Struts1.2, Eclipse Indigo 3.7, Ant1.x, Oracle10g, Tomcat 5.0.
Confidential
Java Developer
Responsibilities:
- Involved in entire cycle of design and development.
- Written design documents which consists workflow UML diagrams.
- Involved in installation of Glassfish server and deployment of applications.
- Involved in implementation of Service oriented architecture SOA.
- Involved in Agile development process.
- Implemented Mock screens for application products during design process.
- Written different shell scripts on Linux/Unix platforms using VI editor to process different financial files
- Created the front-end interface using JSP, JavaScript, CSS and HTML.
- Used MVC, Value Object, and Business Delegate, DAO design patterns.
- Performed input validations using Struts Validator Framework and JavaScript.
- Created advance SQL scripts in PL/SQL Developer to facilitate the data in/out flow in Oracle.
- Created different Autosys jobs to automate the entire financial file processing system.
- Used SVN Subversion as version control for maintaining files and documents.
- Developed ANT scripts for building application EAR for deploying on Glassfish server.
- Created control file to load data from file to the database.
Environment: Java 5, J2EE, JSP, Eclipse, Java Beans, Servlet, Ant, Autosys, XML, HTML, JavaScript, PL/SQL Developer, Linux/Unix, Glassfish Server, Oracle 10g.
