Hadoop Developer Resume
Englewood, CO
PROFESSIONAL SUMMARY:
- Around 9 years of IT experience involving project development, implementation, deployment and maintenance using Big data Hadoop Ecosystem related technologies in Insurance, Health Care & Retail Industry Project sectors with multiprogramming language expertise like Java, Python and Scala .
- 5+ years of Hadoop Developer experience in designing and implementing complete end - to-end Hadoop Infrastructure using HDFS, Map Reduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, and Flume .
- 4+ years of Java programming experience in developing web based applications and Client-Server technologies.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
- Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL and Data Frames.
- Hands-on experience in Installing, Configuring, Testing Hadoop Ecosystem components.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experience of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Built a dashboard to show the statistics on transaction status using Datameer and Platfora.
- Job workflow scheduling and monitoring using tools like Oozie .
- Experience in designing both time driven and data driven automated workflows using Oozie .
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, Struts and JMS.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies .
- Transforming some existing programs into lambda architecture .
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Experience in installation, configuration, support and monitoring of Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows) .
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitions and Buckets.
- Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Experienced in using Integrated Development environments like Eclipse, NetBeans and IntelliJ
- Migration from different databases (i.e. Oracle, DB2, Cassandra, MongoDB) to Hadoop .
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Familiarity with common computing environment (e.g. Linux, Shell Scripting).
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Development experience with Big Data/NoSQL platforms, such as MongoDB and Apache Cassandra.
- Worked and migrated RDMBS databases into different NoSQL database.
- Strong hands-on experience with DW platforms and databases like MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL, DB2 and Teradata.
- Experience in designing and coding web applications using Core Java & web Technologies- JSP, Servlets and JDBC .
- Extensive experience solving analytical problems using quantitative approaches using machine learning methods in R.
- Excellent knowledge in Java and SQL in application development and deployment.
- Familiar with data warehousing " fact " and " dim " table and star schema and combined with Google Fusion tables for visualization.
- Good working experience in PySpark and Spark Sql .
- Experience in creating various database objects like tables, views, functions, and triggers using SQL .
- Good team player with ability to solve problems, organize and prioritize multiple tasks.
- Excellent technical, communication, analytical and problem-solving skills and ability to get on well with people including cross-cultural backgrounds and troubleshooting capabilities.
TECHNICAL SKILLS:
Hadoop/Big Data: MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Storm, Kafka, Flume, Zookeeper, Oozie, Impala.
Programming Languages: C, Core Java, PL/SQL, Python, R, C#.
Java/J2EE Technologies: Servlets, JSP, JDBC, Java Beans, RMI & REST Web services.
Development Tools: Eclipse, Net Beans, SVN, Git, Maven, SOAP UI, JMX explorer, XML Spy, Jira, SQL Developer, QTOAD.
Methodologies: Agile/Scrum/Kanban Board UML, Rational Unified Process and Waterfall.
Monitoring and Reporting: Ganglia, Nagios, Custom Shell scripts.
NoSQL Technologies: Accumulo, Cassandra, MongoDB, Neo4j, HBase
Frameworks: MVC, Struts, Hibernate, And Spring.
Scripting Languages: Unix Shell Scripting, SQL, AngularJS
Distributed plat forms: Hortonworks, Cloudera
Databases: Oracle 11g, MySQL, MS-SQL Server, Teradata. PostgreSQL, IBM DB2
Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux
Software Package: MS Office 2007/2010.
Web/ Application Servers: WebLogic, WebSphere, Apache Tomcat, WebSphere Application Server
Visualization: Tableau, Kibana and MS Excel
Web Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.
PROFESSIONAL EXPERIENCE:
Confidential, Englewood, CO
Hadoop Developer
Responsibilities:
- Perform best practices for the complete software development life cycle including analysis, design, coding standards, code reviews, source control management and build processes
- Work collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources
- Provide design recommendations and thoughts to leadership to sponsors/stakeholders that improve review processes, resolve technical problems and suggest Big Data based analytical solution from disparate sources.
- Coordinate with the Architects, Manager and Business team on current programming tasks.
- Document and demonstrate solutions by developing documentation, flowcharts, layouts, code comments and clear code.
- Producing detailed specifications and writing the program codes.
- Testing the code in controlled, real situations before deploying into production.
- Perform daily analysis on Teradata/Oracle source databases to implement ETL logic and data profiling.
- Importing data from different data sources like Teradata, Oracle into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Process data ingested into HDFS using Sqoop, custom HDFS adaptors and analyzed the data using Spark, Hive, and MapReduce produced summary results from Hadoop to downstream systems.
- Develop simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
- Used Accumulo NoSQL database as a disk persistent layer for GridGain cache layer.
- Build applications using Maven and integrate with Continuous Integration servers like Jenkins.
- Design and develop REST APIs that allow sophisticated, effective and low cost application Integration.
- Create/Modify Shell scripts for scheduling data cleansing scripts and ETL loading process.
- Solved performance issues in Hive, Pig scripts with understanding of Joins, Group and aggregation, and how does it translate to MapReduce jobs.
- Create Sqoop jobs with incremental load to populate Hive External tables with partitions and bucketing enabled.
- Implement ETL logic by solving fail over scenarios to ingest Data from external sources through SFTP servers into HDFS .
- Developed end-to-end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Develop Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
- Configure Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Create Hive queries, which help market analysts to spot emerging trends and promotions by comparing fresh data with reference tables and historical metrics.
- Create components like Hive UDFs for missing functionality in HIVE for analytics.
- Work on Hadoop cluster with Active directory and Kerberos implemented to establish SSO connection with applications and perform various operations.
- Work on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Schedule jobs using UC4 automation framework to run daily, weekly and monthly basis.
- Design and develop Map Reduce jobs to process data coming in different file formats like XML, CSV, flat files and JSON.
- Implement Elastic search instance to read log files from servers and generate reports visualize data on Kibana Dashboard.
- Troubleshoot services running on application servers to have High Availability.
- Experience in large-scale streaming data analytics using Storm.
- Developed product profiles using Pig and commodity UDFs.
- Monitor the applications health and performance using Nagios system. Provide production support and troubleshoot the requests from end-users.
- Be the part of triage call to handle and solve defect reported by Admin team and QA team.
Environment: Hadoop, Hortonworks, Linux, Hive, MapReduce, HDFS, Hbase, Pig, Shell Scripting, Sqoop, Hue, Ambari, Yarn, Tez, Accumulo, Gridgain, REST Web Services, Python, Java 7, MySQL, Eclipse, Oracle 11g, Maven, Log4j, Git, Kafka, Storm, Apache Spark, Elastic Search, Kibana, Kerberos, Nagios, Toad for Oracel, Teradata SQL Developer.
Confidential, Chevy Chase, MD
Hadoop Developer
Responsibilities:
- Devised and lead the implementation of the next generation architecture for more efficient data ingestion and processing.
- Worked with highly unstructured and semi structured data of 90 TB in size with replication factor of 3.
- Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data.
- Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first-mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution translation via lambda architecture.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Good working knowledge of Amazon Web Service components like EC2, EMR, S3 etc.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables .
- Real time streaming the data using Spark with Kafka .
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Implemented Spark using Python ( pySpark ) and Spark SQL for faster testing and processing of data.
- Experience in developing regression models on R for the statistical analysis.
- Experience in writing R or Python code using current best practices, including reproducible research and web-based data visualization.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala .
- Identify concurrent job workloads that may affect or affected by failures or bottlenecks.
- Developed some utility helper classes to get data from HBase tables.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Professional experience with NoSQL HBase solutions to solve real world scaling problems.
- Attending daily status calls to follow scrum process to complete each user story within the timeline.
- Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address HBase limitations.
- Worked on Implementation of a toolkit that abstracted Solr & Elastic Search .
- Worked on Spark with Python and Scala .
- Data is loaded back to the Teradata for the BASEL reporting and for the business users to analyze and visualize the data using Datameer
- Viewing various aspect of a cluster using Cloudera Manager .
Environment: Hadoop, Linux, CDH4, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, Python, Java 7, MySQL, NoSQL, Eclipse, Oracle 11g, Maven, Log4j, Git, Kafka, Storm, Apache Spark, Elastic Search, Solr, Datameer.
Acxiom, Conway, AR
Hadoop Developer
Responsibilities:
- Loading the data from the different Data sources like ( Teradata, DB2, Oracle and flat files ) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- All the bash scripts are scheduled using Resource Manager Scheduler .
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Developed Map Reduce programs for applying business rules to the data.
- Did Implementation using Apache Kafka replacement for a more traditional message broker ( JMS Solace ) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
- Maintaining Authentication module to support Kerberos .
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster .
- Implemented the part to resolve issues related with old Hazelcast API EntryProcessor .
- Participated with the admin team in designing and upgrading CDH 3 to HDP 4 .
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts .
- Used dashboard tools like Tableau .
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazelcast, Git, Mockito, python.
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Frequent interactions with Business partners.
- Designed and developed a Medicare-Medicaid claims system using Model-driven architecture on a customized framework built on Spring.
- Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Imported trading and derivatives data in Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Was part of an activity to setup Hadoop ecosystem at dev & QA Environment.
- Managed and reviewed Hadoop Log files.
- Responsible writing PIG Script and Hive queries for data processing
- Running Sqoop for importing data from Oracle & Other Database.
- Creation of shell script to collect raw logs from different machines.
- Created Partition in a hive as static and dynamic.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage
- Coded many MapReduce program to process unstructured logs file.
- Worked on Import and export data into HDFS and Hive using Sqoop.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Used parameterize pig script and optimized script using illustrate and explain.
- Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
- Implemented FAIR Scheduler as well.
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, CDH Distribution, Windows, Linux, Java 6, Eclipse, Ant, Log4j and Junit
Confidential, Boston, MA
Java/J2EE Developer
Responsibilities:
- Write design document based on requirements from MMSEA user guide.
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Developed the UI using XSL and JavaScript.
- Managed software configuration using ClearCase and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Environment: Shell Scripting, Java 6, JEE, Spring, Hibernate, Eclipse, Oracle 10g, JavaScript, Servlets, Nodejs, JMS, Ant, Log4j and Junit, Hadoop (Pig & Hive).
Confidential
Java Developer
Responsibilities:
- Involved in various SDLC phases like Design, Development and Testing.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Developed web pages using HTML, JavaScript, JQuery and CSS.
- Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Developed server side components servlets for the application.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a Web Sphere application server.
- Used automated test scripts and tools to test the application in various phases. Coordinated with Quality Control teams to fix issues that were identified
- Implemented Hibernate ORM to Map relational data directly to java objects
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Involved in developing spring web MVC framework for portals application.
- Implemented the logging mechanism using log4j framework.
- Developed REST API, Web Services.
- Wrote test cases in JUnit for unit testing of classes.
- Used Maven to build the J2EE application.
- Used SVN to track and maintain the different version of the application.
- Involved in maintenance of different applications with onshore team.
- Good working experience in Tepestry processing claims.
- Working experience with professional billing claims.
Environment: Java, Spring Framework, Struts, Hibernate, RAD, SVN, Maven, Web Sphere Application Server, Web Services, Oracle Database 11g, IBM MQ, JMS, HTML, Java script, XML, CSS, REST API.
