Hadoop/big Data Lead Resume
Morrisplains, Nj
SUMMARY:
- Overall 8 + years of experience in Enterprise Application and product development.
- Experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like HDFS, MapReduce, YARN, Hive, Pig, HBase, Flume, Sqoop Spark, Storm, Scala, Kafka, Oozie, Zookeeper, MongoDB, Cassandra.
- Experience in Hadoop administration activities such as installation, configuration of clusters using Cloudera Manager, Apache Ambari, and troubleshooting using Hadoop ecosystem components in HDP.
- Expertise in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2.
- Extensive experience in writing MapReduce programs in Java and scala for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Built Automated Test Frameworks to test Data Values, Data Integrity, Source - to-Target record counts, and field mappings between Transactional, Analytical Data Warehouse and Reporting systems.
- Excellent in seeking continuous improvement by creation of simplified processes and automating repeating routines.
- Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Having Prior experience with Pentaho Data Integration (PDI) is desirable.
- Extensive experience in designing online analytical processing-OLAP and online transaction processing- OLTP databases.
- Experienced working with Hortonworks Distribution and Cloudera Distribution.
- Expert knowledge in creating Agile Test Strategy, Test Cases, Test Scenarios by reviewing business, functional, and non - functional requirements.
- Experience of programming with various components of the framework, such as Impala. Should be able to code in Python as well.
- Proficient in developing data transformation and other analytical applications in Spark, Spark-SQL using Scala programming language.
- Profound experience in creating real time data streaming solutions using Apache Spark Streaming,Kafka, Nifi and RabbitMQ.
- Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Developed index manager for elastic search which controls the size of indices even when millions of records are ingested per day.
- Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance of Spark, with Hive.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Actively involved in various product developments and developed a key core framework using several design patterns.
- Extensive experience in build/deploy multi module projects using Ant, Maven, GIT, SVN etc.
- Good experienced in working with agile, scrum and Waterfall methodologies.
- Successfully working in fast-paced environment, both independently and in collaborative team environments.
TECHNICAL SKILLS:
Big Data: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Nifi.
Java Technologies: Java, Java, JAXP, AJAX, JFC Swing, Log4j.
J2EE Technologies: JSP, Servlets, JDBC, XML, JAXP, JBeans
Methodologies: Agile, Scrum,UML,Waterfall
Programming Language: Scala, Core Java
Database: Oracle, MySQL, Casandra, HBase
Application Server: Apache Tomcat, Jboss
Web Tools: HTML, Java Script, XPath, XQuery.
IDE: NetBeans, Eclipse, SBT, Maven
Operating System: Windows, Unix/Linux
Scripts: Bash, Python
Testing Tools: MR Unit, Selenium
WORK EXPERIENCE:
Confidential, MorrisPlains, NJ
Hadoop/Big Data Lead
Responsibilities:
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD,MDD and TDD documents.
- Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE,Zookeeper and SPARK with SCALA.
- Create data flow using Apache Nifi for Monthly real time data processing.
- Using Azure data factory, AZcopy and Explore moving data from one External Blob to Internal Blob Storage.
- Worked on analyzing Hadoop cluster and different Big Data Components including Hive,Spark, Oozie.
- Migration of 100+ TBs of data from different databases (i.e. Azure Blob Storage, Mongo DB,SQL) to Hadoop. Writing code in different applications of Hadoop Ecosystem to achieve the required output in a sprint time period.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in defining job flows Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
- Creates Test Strategy, Test Cases, ensures clear requirements in Version1 stories to support creation of test scenarios and scripts.
- Test Strategy creation and building Test Case template for QA teams to utilize based on functional requirements.
Environment: Azure, Hortonworks, Hive, Oozie, Ambari, Edge Node, Name Node, Spark, Scala, Zookeeper, Eclipse, Azure Storage Explore.
Confidential, Newark, DEHadoop / Big Data Developer
Responsibilities:
- Developed high integrity programs used in systems where predictable and highly reliable operation is essential using Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Developed analytical component using Scala, Spark and Spark Stream.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Extracted, Loading the data from MySQL into HDFS using Sqoop.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Build/Tune/Maintain Hive QL and Pig Scripts for loading, filtering and storing the data and user reporting. Involved in creating Hive tables, loading data, and writing Hive queries.
- Handled importing of data from various data sources, performed transformations using Spark and loaded data into Cassandra.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
- Used Scala to write code for all Spark use cases.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Creates Test Strategy, Test Cases, ensures clear requirements in Version1 stories to support creation of test scenarios and scripts.
- Test Strategy creation and building Test Case template for QA teams to utilize based on functional requirements.
- Base lined ETL/FTL specs and prepared detail Test documents for ETL Process using standard templates.
- Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X
Environment: Core Java, multi-node installation, Map Reduce, Spark, Kafka, Hive, Impala, Zookeeper, Oozie, Python scripting, Scala, MySQL, Eclipse, Tableau 8.X/9.X and HP Vertica 6.X/7.X.
Confidential, Bloomington, INHadoop / Big Data Developer
Responsibilities:
- Worked with AWS service (EC2, S3, Cloud Front, IAM)
- Deploy and configure EC2, Elastic Bean Stalk and RDS instances.
- Worked with the business users to gather, define business requirements and analyze the possible technical solutions.
- Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
- Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
- Integrated MapReduce with HBase to import bulk amount of data into HBase.
- Extracted feeds from social media sites such as Facebook, Twitter using Scala and Python scripts.
- Participated with admin in installation and configuring Map Reduce, Hive and HDFS.
- Finding creative solutions for GUI testing with QTP automation tool.
- Involved in Integration testing, Functional testing and Regression testing and create Master test plans, Business Assurance Strategy documents, test cases, & test data.
- Participate in daily/weekly meetings and provide status updates to the team
Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux, Scala, Eclipse, Maven.
Confidential, Framingham, MAHadoop / Big Data Developer
Responsibilities:
- Installed and configured Hadoop, MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Import and export data into HDFS and Hive using Sqoop.
- Experienced in managing and reviewing Hadoop log files.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Load and transform large sets of structured, semi structured and unstructured data and responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster automatically.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hiveand written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Exported the result set from Hive to MySQL using Shell scripts.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Cloudera, MapR, Flat files, Oracle, PL/SQL, Jenkins, Windows NT, UNIX Shell Scripting, Eclipse.
Confidential, St.Petersburg, FL
Jr. Java Developer/Tester
Responsibilities:
- Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
- Analyze the requirements and communicate the same to both Development and Testing teams.
- Involved in fixing defects and unit testing with test cases using JUnit. Developed user and technical documentation.
- Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
- Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT.
- Analyzed Software and Business Requirements documents to get a better understanding of the system from both technical and business perspectives.
- Performed testing of web based applications.
- Involved in all phases and stages of Software testing life cycle.
- Experience in defining Testing Methodologies and Test Environment.
- Tested applications compatibility on different browsers (IE, Chrome, Safari and Firefox). Tested mobile version of application on iOS and Android devices as well as device emulators.
- Created Regression Testing Design Document and Conducted Regression Testing.
- Loaded Test Cases in Quality Center based on Use-Cases and Requirements, and executed test to verify actual results against expected results.
- Involved in Modifications or Rework to existing test cases whenever there is a change or enhancement in functionality of the application.
- Executing, Reporting and Tracking the defects using Quality Center and JIRA.
- Experience writing Stored Procedures, Functions and Packages.
Environment: Java, J2EE, Struts MVC, JDBC, JSP, JavaScript, HTML,Oracle, JUNIT, Jenkins and Log4j, Eclipse.