We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • Around 8+ years of professional IT experience which includes experience in Big Data ecosystem and Java/J2EE related technologies.
  • Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP).
  • Hands on experience in configuring and using Apache Hadoop/Cloudera ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, Impala, YARN, Flume, Zookeeper, Sqoop, HUE, Oozie, Azkaban.
  • Good knowledge in processing of real - time data using Spark in Java/Scala/Python.
  • Have good experience in developing real time data streaming solutions using Apache Spark/Spark Streaming, Kafka and Flume.
  • Hands on experience in handling different file formats like Sequential files, Text files, XML, JSON, AVRO and PARQUET.
  • Extending Hive and Pig core functionality by writing custom UDFs in Java/Python.
  • Extensive hands on experience in writing complex Hive queries and Pig Latin scripts.
  • Experience in working with Flume/Kafka to load the log data from different sources into HDFS.
  • Experience in using Apache Sqoop to import and export data to/from HDFS and external RDBMS databases like Netezza, Teradata, Oracle, SqlServer, and MySQL.
  • Hands on experience in setting up workflows using Apache Oozie workflow engine and Azkaban for managing and scheduling Hadoop jobs.
  • Experience in using Hcatalog for Hive, Pig, Impala and SparkSQL.
  • Good knowledge in handling messaging services using Apache Kafka.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Good knowledge in using Streamsets to ingest data from various types of sources.
  • Extensive experience in both Map Reduce framework MRv1 and YARN.
  • Good Knowledge on Hadoop Cluster installation, architecture and monitoring the cluster.
  • Implemented in setting up standards and processes for Hadoop based application design and implementation.
  • Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning, Bucketing
  • Injecting the data from webservers into Hdfs using flume.
  • Design technical solution for real-time analytics using HBase.
  • Experience in Elastic Search and Elastic path.
  • Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Good understanding in database and data warehousing concepts (OLTP & OLAP).
  • Extensive experience working in Oracle, DB2, SQL Server, Vertica and My SQL database.
  • Hands on NoSQL database experience with HBase, Mongo DB and Cassandra.
  • Designed and Developed Talend Jobs to extract data from Oracle into MongoDB.
  • Expertise in software functional and non-functional testing, and test planning, test strategy and management, test estimation, test lead activities, authoring of test requirements and test cases, peer reviews, designing of requirements traceability matrix, test execution, defect reporting and tracking, and manual and automation testing from conception to production.
  • Good knowledge of Data warehousing concepts and ETL and Teradata.
  • Extensively worked in creating and integrating Reports and Objects in MicroStrategy (Attributes, Filters, Metrics, Facts, Prompts, Templates, Consolidation and Custom Groups) in the data warehouse.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Agile, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC, JMS.
  • Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, Collections, Data-structures and Serialization.
  • Well versed with complete Software Development Life Cycle process which includes Designing, Developing, Testing and Implementation.
  • Good organizational skills and ability to multi-task, work independently. Excellent written and verbal communication skills including experience in proposal and presentation.

TECHNICAL SKILLS

Big data/Hadoop Ecosystem: HDFS, Map Reduce, Hive, PIG, Kafka, HBase, Impala Sqoop, Flume, Spark, Oozie, Cloud Manager and Zookeeper.

Hadoop Distributions: Apache Hadoop, CDH3, CDH4, Hortonworks.

Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML.

Programming Languages: C, C++, Java, SQL, PL/SQL, Linux/Python shell scripts, Scala

Database: Oracle 11g/10g, DB2, MySQL, Teradata, Vertica.

Web Technologies: HTML, XML, JDBC, JSP, JavaScript.

Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.

Tools: Used: Eclipse,IntelliJ, GIT, NetBeans,Jenkins,Jira, Microstrategy(BI).

Operating System: Ubuntu (Linux), Windows, Mac OS, Red Hat

Testing: Hadoop Testing, Hive Testing, Quality Center (QC), ETL, Manual testing

PROFESSIONAL EXPERIENCE

Big Data Developer

Confidential, Austin, TX

Responsibilities:

  • Wrote scripts using Pig, HiveQL, and Unix.
  • Wrote Sqoop jobs for data loading from Data warehouse to Hadoopsystems and vice - versa.
  • Wrote and execute SQL queries to work with structured data available in relational databases and to validate the transformation/ business logic.
  • Use Flume to move data from individual data sources to Hadoop system.
  • Use MRUnit framework to test the MapReduce code.
  • Responsible for building scalable distributed data solutions using Hadoop Eco system and Spark.
  • Involved in the process of data acquisition, data pre-processing various types of source data using Streamsets.
  • Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
  • Tested a POC with a smaller Hortonworks cluster.
  • Performed real-time analysis of the incoming data using Kafka, Flume and Spark Streaming.
  • Design and manage the big data ware house in Hive.
  • In pre-processing phase used spark to remove all the missing data and data transformation to create new features.
  • In data exploration stage used hive and impala to get some insights about the customer data.
  • Importing and exporting data various RDBMS into HDFS and HIVE using Sqoop.
  • Implemented the workflows using Apache Oozie/Azkaban to automate tasks.
  • Good knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Create requirements traceability matrix (RTM) for the releases.
  • Understand ETL Design and mapping documents and identify test scenarios and prepare test cases.
  • Understand data mapping from source to target tables and the business logic used to populate targe table.
  • Involve in setting up the testing environments and prepare test data for testing flows to validate andprove positive and negative cases.
  • Worked on both Hadoopdistributions: Cloudera and Hortonworks.
  • Built Cassandra Cluster on both the physical machines and on AWS.
  • Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Beanstalk, and AWS Cloud Formation.
  • Verify and validate missing, duplicates, null, and default records as per the design specifications.
  • Responsible for testing and validating the data at all stages of ETL process.
  • Worked with Data Warehouse applications using Teradata, Vertica.
  • Validate at each stage for failed records and ensure updates to error logs and triggering of email notifications to support teams.
  • Involve in smoke, integration, system, data redundancy, security, performance, end to end, and ETL workflows testing.
  • Ability to spin up different AWS VPC like Ec2, EBS, S3, EMR using cloudformation templates.
  • Use Bugzilla for management, presentation, and tracking the defects/ bugs and conduct defect review meetings to prioritize defects.
  • Assist in preparing weekly status, test execution, and test metrics reports.
  • Worked on JIRA to issue tracking and bug tracking.
  • Working closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services.
  • Generated and implemented MicroStrategy reports by creating projects, Attributes, hierarchies, metrics, filters, and templates using MicroStrategyArchitect.
  • Responsible for review of all test reports and provide the QA sign off for all the staging deployments.
  • Participate in other testing and process improvement projects, as needed
  • Worked on Zena scheduler to the entire data flow.
  • Developed test plan, test strategy, test scenarios, test cases, and test scripts to thoroughly test
  • PPM related application changes and new products.
  • Structured data was ingested onto the data lake using Sqoop jobs and scheduled using Oozie workflow from the RDBMS data sources for the incremental data.
  • Streaming Data was ingested into the data lake using Flume.
  • Author and execute functional and non-functional test cases that explore and verify application functionality and conduct reviews with business and development team.
  • Analyze and troubleshoot erroneous results, determine the root causes, log defects in Bugzilla and enable defect management, including working with the development team on the resolution of software related defects.
  • Testing of new scripts being implemented within ADSP and ensure that the application is working in all possible scenarios.
  • Work closely with Business Analysts, Developers and Scrum Master to ensure timely and effectivetesting effort.
  • Identify, define and implement testing process and procedure improvements to continuously enhance product and process quality.

Environment: Zena, Big Data, Hadoop, Sqoop, Flume, HiveQL, Pig Latin, Windows 7, UNIX, Putty, WinSCP, Bugzilla, Agile Project Methodology, JIRA, Vertica, SQL,Teradata, MicroStrategy, ETL.

Hadoop developer

Confidential, Rosemont,IL

Responsibilities:

  • Developed multiple Map Reduce programs for analyzing the data of the customer and produce summary results from Hadoop to downstream systems.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop.
  • Developed data pipeline using Flume to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Prepared the best practices in writing Map Reduce programs and Hive scripts.
  • Scheduled a workflow to import the weekly transactions in the revenue department from RDBMS database using Oozie.
  • Built wrapper shell scripts to hold these Oozie workflow.
  • Developed PIG and Python Latin scripts to transform the log data files and load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Knowledge and proven experience on object oriented design, SOA, distributed computing And Experience with working in an Agile/SCRUM Model.
  • Created External Hive tables and involved in data loading and writing Hive UDFs
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of in Zookeeper implementation the cluster.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Developed Unit test cases using MR unit for map reduce code.
  • Involved in creating Hadoop streaming jobs.
  • Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4 Distribution.

Environment: Hadoop, Map Reduce, ETL, Hive, HDFS, PIG, Scala, Agile, Sqoop, Oozie, Cloudera, Flume, HBase, Zookeeper, CDH3, Oracle, NoSQL and Unix/Linux.

Hadoop Engineer

Confidential, Ridgefield Park, NJ

Responsibilities:

  • Participated in gathering requirements, analyze requirements and design technical documents for business requirements.
  • Worked on large Hadoop cluster with Kerberos environment including KMS and KTS Servers.
  • Load and transform large sets of flat files and semi structured that includes xml files.
  • Developed Java Program from converting semi structured data to csv files and then loading into Hadoop.
  • Orchestrated hundreds of Sqoop queries and Hive queries using Oozie workflows and Coordinators.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries and partitioning only on Impala tables.
  • Implemented Spark SQl scripts using DataFrames and Datasets API s.
  • Responsible for deploying Hive UDF's on Hadoop Prod and Dev cluster.
  • Integrated Tableau with Impala and published workbooks from Tableau Desktop to Tableau Server.
  • Publishing Tableau workbook from Multiple Data sources and scheduling automated refreshes on the Tableau Server.
  • Spinning up of Hadoop Cluster in AWS using Cloudera Director.
  • Responsible for handling part of dev operations like daily job monitoring, interacting with Cloudera team and providing cluster access to the new users.
  • Responsible for granting access roles on the databases and tables using Sentry.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.

Environment: AWS, Amazon S3, Impala, Hive, Spark SQL, Shell, Cloudera Enterprise, Cloudera Director, Cloudera Navigator, Cloudera Manager, Sentry, Jira and Github.

Hadoop Developer

Confidential, Dallas, Tx

Responsibilities:

  • Worked on analyzing, writing Hadoop Map Reduce jobs using Java API, Pig and Hive.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Installed/Configured/Maintained Horton worksHadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node.
  • High availability, capacity planning, and slots configuration.
  • Configured MySQL Database to store Hive metadata.
  • Created Partitions, Bucketing in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Good hands on experience in writing core java level programing in order to perform cleaning, pre-processing and data validation.
  • Created Map Reduce jobs using Pig Latin and Hive Queries.
  • Developed Pig Latin scripts and used Pig as ETL tool for transformations, event joins, and filter.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Worked in tuning Hive and Pig scripts to improve performance.
  • Create Hive scripts to extract, transform, load (ETL) and store the data.
  • Created and exposed Hive views through Impala for the business Users
  • Administered, installed, upgraded and managing CDH3, Pig, Hive& Hbase.
  • Worked on Prepare Developer (Unit) Test cases and execute developer testing.
  • Performed unit testing and regression testing and shared the screenshots to the clients using ETL.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Used Data Lake concepts to store data in HDFS.
  • Worked on different data sources such as Oracle, MySQL, Flat files etc.
  • Used Eclipse IDE for designing, coding and developing applications
  • Developed PIG scripts and Python to transform the raw data into intelligent data as specified by business users.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Talend Mappings to populate the data into dimensions and fact tables.
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
  • Experienced in Talend Data Integration, Talend Platform Setup on Windows and UNIX systems.
  • Process log file data in to CSV format and perform google analytics depending up on the client requirement with Hortnworks Hadoop cluster and Tableau visualization.
  • Implemented a script to transmit sysprin information from Teradata and Oracle to Hbase using Sqoop.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.
  • Implemented best income logic using Pig scripts and UDFs.
  • Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Managed data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Experience in managing and reviewing Hadoop log files using flume.
  • Job management using Fair scheduler.
  • Designed a data warehouse using Hive.
  • Used Tableau for Data Visualization of queries in the Hive Summary tables.
  • Exported the analyzed data to the relational databases using Sqoop.
  • Wrote Query Mappers and MQ Experience in JUnit Test Cases
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop, Horton works, Sqoop, Flume, Oozie, Map Reduce, Spark, Storm, Scala, HDFS, Pig, Python, Hive, Impala, Hbase, Kafka, Java, Oracle, MySQL, Shell Scripting, Ubuntu, Eclipse, Unix and Tableau.

Java/J2EE Developer

Confidential 

Responsibilities:

  • Involved in requirement gathering, functional and technical specifications.
  • Monitoring and fine tuning IDM performance.
  • Enhancements in the self-registration process.
  • Fixing the existing bugs in various releases.
  • Global deployment of the application and co-ordination between the client, development team and the end users.
  • Setting up of the users by reconciliations, bulk load and bulk link in all the environments.
  • Wrote requirements and detailed design documents, designed architecture for data collection.
  • Developed OMSA UI using MVC architecture, Core Java, Java Collections, JSP, JDBC, Servlets and XML within a Windows and UNIX environment.
  • Used Java Collection Classes like Array List, Vectors, Hash Map and Hash Table.
  • Used Design Patterns MVC, Singleton, Factory, Abstract Factory.
  • Created Hibernate OR mapping of the tables and integrated with spring (Transaction Management).
  • Used Maven for building the application, and completed testing by deploying on application server
  • Experienced building RESTful(AJAX/JSON) applications
  • Developed algorithms and coded programs in Java.
  • Designed various tables required for the project in Oracle 10g database and involved in coding the SQL Queries, Stored Procedures and Triggers in the application.
  • Pushed the code to Jenkins and integrated the code with Maven.
  • Co-ordinate with different IT groups and Customer.
  • Developed Unit Test Cases. Used JUNIT for unit testing of the application.
  • Involved in design and implementation using Core Java, Agile, Struts, and JMS.
  • Developed complex SSAS cubes with multiple fact measures groups, and multiple dimension hierarchies based on the OLAP reporting needs.
  • Performed all types of testing includes Unit testing, Integration and testing environments.
  • Worked on a modifying an existing JMS messaging framework for increased loads and performance optimizations.
  • Used Combination of client and server side validation using Struts validation framework.

Environment: JAVA, Agile, JSP, JSON, Ajax, Design Patterns, struts, spring, Hibernate, Oracle, SQL/ PL SQL, JMS.

We'd love your feedback!