We provide IT Staff Augmentation Services!

Sr. Hadoop Tester Resume

4.00/5 (Submit Your Rating)

Baltimore, MD

PROFESSIONAL SUMMARY:

  • Overall 8+ years of experience in Analysis, design, development, testing &deployment in IT industry.
  • Knowledge with all phases of software development life cycle (SDLC).
  • 5 years of experience installing, configuring, and testing Hadoop ecosystem components.
  • Experienced in working with Pig, Hive, Sqoop and Map Reduce.
  • Extensive experience with Hbase and flume.
  • Experience installing and developing on ELK .
  • Worked on Spark - Streaming using Scala.
  • Worked on PySpark module
  • Have worked on implementing Kafka pipeline
  • Have knowledge on Jenkins and Git tools.
  • Worked on Tableau dashboard
  • Integrated sql with Tableau
  • Good experience working with Hortonworks Distribution, Map R and  Cloudera  Distribution. 
  • Created a 360 degree view of customer data for a financial client in a Hadoop data lake
  • Implemented Hadoop muti node cluster on a AWS storage
  • Worked on different ETL processes for the data ingestion module.
  • Extensive experience with ETL technologies, such as Informatica.
  • Experience in end to end design, development, Maintainance and Analysis of various types of applications using efficient Data Science Methologies and Hadoop ecosystem tools.
  • Experience in providing Solution Architecture for Big Data projects using Hadoop Eco System.
  • Experienced in setting up of Hadoop cluster, Performance Tuning, Developing Logical & Physical Data Models using HIVE for Analytics, Data lake creation using Hive and Data load management using SQOOP. 
  • Implemented Hive workflows within Cascading flows and Cascades.
  • Performed data processing operations in Scala with Scalding.
  • Have developed Cascading applications on Hadoop that integrate with Teradata.
  • Experience in Linux shell Scripting.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop Information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Database / ETL Performance Tuning: Broad Experience in Database Development including effective use of Database objects, SQL Trace, Explain Plan, Different types of Optimizers, Hints, Indexes, Table Partitions, Sub Partitions, Materialized Views, Global Temporary tables, Autonomous Transitions, Bulk Binds, Capabilities of using MS SQL Built-in Functions. Coding of Database objects like Triggers, Procedures, Functions and Views. 
  • Performance Tuning of Informatica Mapping and workflow.
  • Exposure to T-SQL  programming and Architecture, and translated complex legacy process with T-SQL procedure, functions and package. 
  • Knowledge of working with star schema & Snow-Flake Schema. 
  • Excellent interpersonal and strong analytical, problem-solving skills with customer oriented attitude.
  • A very good team player, self-motivated, dedicated in any work environment.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Spark, Scala, Cascading, Zookeeper, Hive, Pig, Sqoop, Flume and Pivotal HD.

Programming Languages: Java J2EE, Python, Scala, C, R.

Scripting Languages: JSP & Servlets, JavaScript, XML, HTML and Bash.

Databases  : MS SQL, Oracle, Vertica, NoSQL, MongoDB.

IDEs& Tools  : Rational Rose, Rational Team Convert, Eclipse, NetBeans, Eclipse, JUnit, jQuery, MQ, TOAD, SQL Developer, Microsoft Visual studio 2008/10, Yum, RPM.

Versioning Tools  : SVN, CVS, Dimensions, and MS Team Foundation Server

Scripts &Libraries  : Java Script, AngularJs, Node.js, Freemarker, Groovy, Maven, Ant Scripts,XML DTDs, Xquery, XPath, XSLT, XSDs, JAXP, SAX and JDOM.

MarkupLanguages  : XSLT, XML, XSL, HTML5, DHTML, CSS, OO CSS, jQuery, AJAX.

Operating Systems  : Red Hat Linux 6.2/6.3, Unix, Solaris, Windows 7/8, Linux.

PROFESSIONAL EXPERIENCE:

Confidential

Sr. hadoop Tester

Baltimore, MD

Responsibilities:

  • Experience in providing Solution Architecture for Big Data projects using Hadoop Eco System.
  • Experienced in setting up of Hadoop cluster, Performance Tuning, Developing Logical & Physical Data
  • Models using HIVE for Analytics, File processing using PIG, and Data load management using SQOOP
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Performing modeling for Hive data lake in the Hadoop cluster
  • Understand data mapping from Source to Target tables and the business logic used to populate target table
  • Involve in setting up the testing environments and prepare test data for testing flows to Validate and prove positive and negative cases
  • Verify and validate missing, duplicates, null, and default records as per the design specifications
  • Assist in preparing weekly status, test execution, and test metrics reports
  • Responsible for review of all test reports and provide the QA sign off for all the staging deployments
  • Verify the Data flow ( Data mapping, Counts) from source to target
  • Verify the data Transformation logic from source to target works as expected
  • Verify the Database structural changes if any, DB fields and field data is loaded without any truncation
  • Perform Data validation like - Valid values, null value check, duplicate check, blank record check, and ? checks
  • Verify and Validate the data integrity
  • Perform SQL data modeling in the SQL data model before moving into staging area.
  • Writing Pig scripts to process the data
  • Have worked extensively on HIVE
  • Expert in implementing the in-memory computing capabilities like Apache Spark written in pyspark
  • Processed Real time data using SPARK STREAMING in pyspark.
  • Used Kafka pipeline for streaming data transmission.
  • Used SPARK SQL to query directly on the Spark Streaming data
  • Worked on the core and Spark SQL modules of Spark extensively. .
  • Have performed the plugging-in operation for Elasticsearch into Cascading flows.
  • Developed multiple POCs using Spark-Streaming and deployed on the Yarn cluster, compared the performance of Spark, with Storm
  • Experience in using Ambari and performed installation of different Hadoop ecosystem components through Ambari. 
  • Continuous monitoring and managing the Hadoop cluster through Ambari
  • Used ELK(Elasticsearch, Logstash and kibana) for name search pattern for a customer.
  • Installed logstash rpm and started the logstash in our environment.
  • Used elasticsearch for name pattern matching customizing to the requirement.
  • Installed logstash-forwarder and run logstash-forwarder to push data
  • Used Kibana plugin to visualize for elasticsearch.
  • Created different dashboards based uon the level of user and this was integrated with the customer care support UI.
  • Have done POC on Greenplum with Spark- Streaming
  • Used MADLIB and MLIB packages in Spark streaming to train and model the incoming realtime data along with the historical data.
  • Worked on Tableau dashboard
  • Integrated sql with Tableau
  • Created a data lake based on hive which stores all the Hadoop related batch data.
  • Applied business concepts to design and maintenance an internal database for Advanced Analytics group with MySQL and memSQL, including database backup, restoring, and optimization. 
  • Used ETL to extract files for the external vendors and coordinated that effort
  • Used Change Data Capture (CDC) to simplify ETL in data warehouse applications
  • Written Hive queries for data analysis to meet the Business requirements. 
  • Developed and executed shell scripts to automate the jobs 
  • Supported MapReduce Programs those are running on the cluster. 
  • Experienced in defining job flows. 
  • Worked on Hbase where data from HDFS was moved into HBase for analysis.
  • Experienced in managing and reviewing Hadoop log files, Supported MapReduce Programs running on the cluster. 
  • Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring)
  • Moved various formats like TSV, CSV etc. files from RDBMS to HDFS for further Processing.
  • Gathered the business requirements by coordinating and communicating with business team.
  • Prepared the documents for the mapping design and production support. 
  • Written the Apache PIG scripts to process the HDFS data and send the data to HBase. 
  • Involved in developing the Hive Reports, Partitions of Hive tables. 
  • Moved user information from MS SQL Server to HBase using Sqoop.
  • Involved in integration of Hive and HBase.
  • Created MapReduce jobs using Hive/Pig Queries.

Environment: Linux, HDP, Eclipse, ElasticSearch, Cascading, Kerberos, Ranger, HDFS, Pig, Hive, Sqoop, Flume, Java, JEE,Python,Spark, HBase, SQL Server 2014.

Confidential, New York city, NY

Hadoop Consultant

Responsibilities:

  • Performed root cause analysis in the client data warehouse.
  • Maintained client relationship by communicating the daily status and weekly status of the project. 
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Coded in Python with Numpy, Scipy, and Pandas modules. Performed statistical analysis on different data patterns to recommend top valued data patterns with the highest accuracy in predicting the next data point. Used Python Matplotlib and Excel to generate charts, and report to customer using PowerPoint in weekly meetings 
  • Analysing Test Results & Identify bugs and reporting to the developers using Quality Center. 
  • Involved in testing to run SQL scripts and check the records from the main table in the server. 
  • Tested extensive backend testing by using database query tools SQL Server, SQL Queries to retrieve the data from the database and checked data integrity and Data Validation. 
  • Involved in Patient Data, Data distribution, Configuration Utilities and Installation process. 
  • Worked on a POC which built custom Python scripts for pre, post load processes and control processes while loading data on Vertica from Hadoop
  • Implemented the partitioning strategy at the table level to streamline the load process on the Vertica database
  • Participated in design review for migrating the platform from MS SSAS to Vertica/ROLAP. Refactored multiple ETL approaches into a single modular one, refine security scheme to enable multiple layer of defense.
  • Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart
  • Integrated different data sources using hive to create a single large table in data lake
  • Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart
  • Used ETL to extract files for the external vendors and coordinated that effort
  • Used Change Data Capture (CDC) to simplify ETL in data warehouse applications
  • Create solution combining Vertica and Hadoop for clickstream data.
  • Created Python scripts to conduct routine maintenance and deliver ad hoc reports. Monitored and tuned user-developed JavaScript
  • Facilitated storage by identifying the need and subsequently developing JavaScript to archive GridFS collections. 
  • Proactively developed and implemented a Python script to report the health and metadata of a shard cluster. 
  • Design & implement sharding and indexing strategies
  • Monitor deployments for capacity and performance
  • Define and implement backup strategies per data retention requirements
  • Develop and document best practices for data migration
  • Incident and problem management
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing
  • Handling structured and unstructured data and applying ETL processes. 
  • Managing and reviewing Hadoop log files. 
  • Running Hadoop streaming jobs to process terabytes data. 
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Proactively monitored systems and services, architecture design and implementation of Hadoopdeployment, configuration management, backup, and disaster recovery systems and procedures
  • Performed visualization using sqlintergrated with Tableau on different input data
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Load and transform large sets of structured, semi structured and unstructured data
  • Supported Map Reduce Programs those are running on the cluster
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure condition
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs
  • Utilized Java and MS SQL from day to day to debug and fix issues with client processes
  • Managed and reviewed log files
  • Implemented partitioning, dynamic partitions and buckets in HIVE

Environment: Hadoop, Cloudera 5.3.0, ETL, Eclipse, R programming,Python, Map Reduce, Hive, Pig, Hbase, Putty, Sqoop, Flume, Scala, Spark, Linux, Java, Tableau, Eclipse, HDFS, PIG, Java (JDK), MSSQL, Vertica and CENTOS.

Confidential, Philadelphia, PA.

Hadoop Consultant.

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Coordinated with business customers to gather business requirements, also interacted with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Configured RabbitMQ to act as an interface between Mobile Phone(client) and server.
  • Implemented RabbitMQ as middleware queuing service to store messages in a queue.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables loading data and writing hive queries that will run internally in MapReduce way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
  • Load and transform large sets of structured and semi structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • Worked in data lake creation process
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment : Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Hbase, RabbitMQ.

Confidential, Phoenix, AZ

MS SQL/ETL Consultant

Responsibilities:

  • Worked as developer and administrator on MS SQL Server.
  • Maintained client relationship by communicating the daily status and weekly status of the project.
  • Developed complex T-SQL  code.
  • Created Database Objects - Tables, Indexes, Views, User defined functions, Cursors, Triggers, Stored Procedure, Constraints and Roles.
  • Used  SQL profiler to view indexes performance to mostly eliminate table scan. 
  • Maintained the table performance by following the tuning tips like normalization, creating indexes and collect statistics. 
  • Managed and monitored the use of disk space. 
  • Analyzed, defined, and developed build process improvements using TFS.
  • Created groups/users and defined user permissions for the projects in TFS.
  • Maintained the consistency of the client's Database using DBCC.  
  • Create indexes on selective columns to speed up queries and analysis in SQL Server Management Studio
  • Implemented triggers and stored procedures and enforced business rules via checks and constraints. 
  • Performed data transfers using BCP and BULK INSERT utilities. 
  • Implemented Mirroring and Log Shipping for Disaster recovery. 
  • Executed transactional and snapshot replication .
  • Performed all aspects of database administration, including data modeling, backups and recovery .
  • Experience in troubleshooting replication problems.
  • Checking Database Health by using DBCC Commands and DMVS.  
  • Generated server side T-SQL  scripts for data manipulation and validation and created various Snapshots and materialized views for remote instances. 
  • Tuned stored procedures by adding the try catch blocks for error handling. 
  • Tested to optimize the Stored Procedures and Triggers to be used in production.
  • Worked with Data Modeler and DBAs to build the data model and table structures. Actively participated in discussion sessions to design the ETL job flow.
  • Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document the changes.

Environment: MS SQL Server, Business Intelligence Development Studio (BIDS), SQL server integration services (SSIS), SQL server reporting services (SSRS), SQL Scripts, Linux Script, Unix Operating System, Windows Operating System, T-SQL, TFS 2012, ETL, Stored Procedures, MS Access, MS visio and MS Excel.

Confidential

Web Developer

Responsibilities:

  • Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements.
  • Created front-end interfaces and Interactive user experience using HTML, CSS, and JavaScript.
  • Responsible for validation of Client interface JSP pages using Struts form validations.
  • Ajax for better and more options in graphical and shaping page with JavaScript.
  • API and SOAP for transferring data and information between other websites.
  • Worked on Struts framework to create the Web application.
  • Developed Servlets, JSP and Java Beans using Eclipse.
  • Designed and developed struts action classes for the controller responsibility. 
  • Involved in the integration of Spring for implementing Dependency Injection (DI/IOC). 
  • Responsible for Writing POJO, Hibernate-mapping XML Files, HQL.  
  • Involved with the database design and creating relational tables. 
  • Utilized Agile Scrum to manage full life-cycle development of the project.
  • Building and Deployment of EAR, WAR, JAR files on test, stage and production servers.
  • Involved with the version control and configuration management using SVN

Environment: HTML, CSS, XML, DHTML, XHTML, DOM, POJO,SQL, SOAP, JSP, JavaScript, JQuery, AJAX, JSON, Eclipse.

We'd love your feedback!