We provide IT Staff Augmentation Services!

Big Data Architect Resume

5.00/5 (Submit Your Rating)

Manhattan New, YorK

SUMMARY:

  • 9+ years of Professional experience in IT Software and Services in design, development and testing various applications in telecom and financial domains.
  • Expertise in HDFS, MapReduce, YARN, Hive, HBase, Pig, Phoenix, Sqoop, Flume, Kafka, Apache Nifi, Zookeeper, Apache Kylin, Oozie and various other ecosystem components with administration skils such as setting up the HDP Cluster and Installations.
  • Good understanding on Business Intelligence, ETL Transformations and Hadoop Cluster Management.
  • Experience in building data Ingestion, extraction, transformation for various datasets onto HDFS.
  • Expertise in implementing HBase schemas with optimized Row - key design to avoid Hot-spotting.
  • Experience in Populating HBase tables via phoenix DB to expose the data for Spot-fire Analytics.
  • Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
  • Optimized HiveQL by using execution engine like Tez.
  • Performed Importing and exporting data into HDFS and Hive from DBMS using Sqoop.
  • Experience in ETL Hive Transformation, HBase load via Phoenix to visualize data analytics.
  • Analyzed the data using Hive queries and running Pig scripts to study customer behavior.
  • Developed Pig UDF'S to pre-process the data for analysis.
  • Used SFTP to transfer the files to server.
  • Implemented Apache Nifi - Real time streaming on various types of Non-clinical and clinical data.
  • Experience in Talend Big Data Platform Studio, Implemented financial audit ETL transformations flows.
  • Implemented Splunk Enterprise environment in HDP Cluster for log aggregation to analyze ecosystem issues.
  • Implemented AppDynamics in HDP to understand the cluster behavior for alerting and monitoring.
  • Good understanding of Java Object Oriented Concepts and development of multi-tier enterprise web applications.
  • Knowledge on cloud services like Azure and AWS E2 instances.
  • Experience with Operating Systems like Windows, Linux, and Macintosh.
  • Good Understanding in complete SDLC and STLC.
  • Strong troubleshooting and production support skills and interaction abilities with end users.
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
  • Self-motivated with a strong desire to learn and an Effective Team Player.

TECHNICAL SKILLS:

Hadoop Ecosystem: Hadoop 2.7.1, Map Reduce, YARN, HBase 1.1.2, Sqoop 1.4.6, Oozie 4.2.0, Hive 1.2.1, Pig 0.15.0, ZooKeeper 3.4.6, Splunk, AppDynamics, AWS E2, Talend Big Data Platform 6.2, MongoDB

Cluster: Hortonworks 2.4, Cloudera CDH5.7, Confidential Big Insights 4.1

Languages: Core Java, C, JSP, EAI, Shell Script, R

Web Technologies: HTML5, CSS3, JavaScript, jQuery, XML, XHTML

Servers: Putty, WebSphere, WebLogic, JBoss, Apache Tomcat.

Database: MySQL, Oracle, PL-SQL, NO-SQL(HBase) IDE s: Eclipse Mars.1

PROFESSIONAL EXPERIENCE:

Confidential, Manhattan, New York

Big Data Architect

Responsibilities:

  • Architecting and providing business solutions for various projects use-cases in Mount Sinai.
  • Ingest Clinical data of healthcare providers into Azure Datalake HDFS.
  • Implemented POC on Kyligence/Kylin (OLAP) - Analysed Datasets with in sub-seconds by building cubes on Kyligence and report through tableau dashboard.
  • Implemented POC on Mule-soft-MongoDB Integration - Setup the MongoDB Instance on Terminal Server. Built API connectivity by streaming the JSON data into MongoDB with Mule-soft connectivity.
  • Tuned the Hive/Sqoop Jobs for effective automation of weekly and monthly jobs.
  • Implemented Nifi - Real time streaming on various types of Non-clinical and clinical data from HIE - Health-X Centre to Hive/HDFS and established Kafka consumer to stream data to fetch Pentaho jobs in real time.
  • Implemented POC on Shareinsights Bigdata Analytics Platform by Rapid data preparation, processing and visualization over millions of rows and terabytes of data.
  • Provided infrastructure for Data science team for NLP-POC by establish R-Hive data connectivity for various users in AD by setting up R/Rstudio in HDP Cluster.
  • Experience in loading raw/xml/log files to HDFS and move the data to hive database for analytics.
  • Used Hive to do analysis on the data and identify different correlations.
  • Tuning the map reduce jobs in HDP and increase the performance of the job.
  • Implementation strategies defined for hive logs capture through splunk to monitor the machine critical information of customers.
  • Used Tableau for visualizing clinical data and Patient 360 information.
  • Timely Collaborating Hortonworks, Kyligence Enterprise Support Teams for any ongoing implementation issues.

Environment: HDP 2.5 (Cent OS)- Hadoop Eco System (HDFS, Map-Reduce, YARN, Sqoop, Hive, NoSQL(MongoDB and HBase), Phoenix, Oozie, Zookeeper, ETL-Pentaho, Mulesoft, Tableaue, Splunk, Kylin/Kyligence, DataIku, R-Programming and R Studio, Data Portal, DataWIKI and Apache Nifi.

Confidential, Alpharetta, Georgia

Senior Hadoop Developer/Lead (ETL)

Responsibilities:

  • Ingest SAP/Generic data of various client data into HDFS via EY- Helix UI Web application.
  • Create clients, engagements, workspace and file-set for various clients for financial audit year to analyse meaningful business insights of GL, PP, and RR.
  • Import data from various sources to HDFS, Perform ETL Transformations using Hive based on client’s data.
  • SAP data transformation can be performed via Talend ETL. Generic data (Oracle, CSV etc.) transformations performed via Oozie workflow.
  • Writing UDF's in Hive for complex transformations in staging for Staging.
  • Experience in loading raw files to HDFS and move the data to hive database for analytics.
  • Developing Talend packages using HQL and creating Hive Data warehousing in Hadoop.
  • Used Hive to do analysis on the data and identify different correlations.
  • Tuning the map reduce jobs in HDP and increase the performance of the job.
  • Parallelize the sequential jobs in Talend and optimized the query performance for SAP dataset.
  • ETL data provision from various stages to CDM and RDM Transformation HBase
  • Coordinate the MTK central to upload the various templates on Spot-fire to generate reports.
  • Experience in Log management tool - Splunk. Provide splunk search queries to aggregate log events and analyse/monitor root causes of issues.
  • Experience in AppDynamics - Application to monitor the cluster environment, health and generate reports, alerts based on business requirements.
  • Integrated App Dynamics and Splunk to monitor the cluster to analyse events.
  • Used Spot-fire for visualizing audit data and generate meaningful reports for clients.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Reforming the cluster into cloud - on going implementation of Azure Big Data Platform

Environment: HDP 2.4 (Cent OS): HUEY(Helix User Interface), Map Reduce/YARN, Hive 1.2.1, HBase 1.1.2, Phoenix, Splunk, AppDynamics, Oozie 4.2.0, Ranger 0.6.0, Ambari 2.4.0, Spot-Fire, TFS(Team Foundation Server), Git, Db Visualizer

Confidential, Fort Worth, Texas

Hadoop Developer/Administration

Responsibilities:

  • Worked on a live Big Data Hadoop production environment with 100 nodes.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Worked with HBase databases for non-relational data storage and retrieval on enterprise use cases.
  • Implemented POC Spark Cluster on AWS
  • Involved in Analysing system failures, identifying root causes, and recommended course of actions.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis.
  • Fine tuning hive jobs for optimized performance.
  • Involved in implementing the job workflows and scheduling for the end to end application processing.
  • Extending the functionality of Hive and Pig with custom UDF’s and UDAF’s.
  • Fine tuning Hive jobs for better performance.
  • Used Flume to collect, aggregate and store the web log data onto HDFS.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Used Hive to do analysis on the data and identify different correlations.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie coordinator.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Developed MapReduce/Spark modules for predictive analytics in Hadoop/Hive on AWS.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: HDP 2.3 (Cent OS) Apache Hadoop 2.7.1, HDFS, MapReduce/YARN HBase 1.1.2, Sqoop 1.4.6, Oozie 4.2.0, Hive 1.2.1, NoSQL, ETL, MySql, Teradata, AWS Amazon Web Services

Confidential, Fairfax, VA

Hadoop Developer (ETL)

Responsibilities:

  • Configured Hadoop components including Hive, Pig, HBase, Sqoop, Oozie and Hue in the client environment.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
  • Implemented knowledge of various java, J2EE and EAI patterns.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by defining the job flow in Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Designed HBase schema to avoid Hotspotting and exposed the data from HBase tables to REST API on UI.
  • Developed Pig scripts to transform raw data from several data sources into forming baseline data and loaded the data into HBase tables.
  • Involved in creating POCs to ingest and process streaming data using Spark and HDFS.
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Developed Shell Scripts to automate the batch processing and processed the daily jobs through Maestro scheduler.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Co-ordinate with the offshore team and cross-functional teams to ensure that applications are properly tested, configured, and deployed.
  • Used Tableau for visualizing and to generate reports.

Environment: CDH 5.7.0(Cent OS): Apache Hadoop 2.7.1, MapReduce, HBase 1.1.2, Pig 0.15.0, Sqoop 1.4.6, Oozie 4.2.0, Java 8, Autosys, Hive 1.2.1, Impala, ZooKeeper 3.4.6, Oracle 11g, PL/SQL, SQL Developer 4.0, UNIX. Rest API, Web Services REST, SQL, ANT, Shell Script, JAVA, J2EE

Confidential

Technical Lead/Freelancer

Responsibilities:

  • Designing and implementing new features and functionality
  • Establishing and guiding the website’s architecture
  • Ensuring high-performance and availability, and managing all technical aspects of the CMS.
  • Helping formulate an effective, responsive design and turning it into a working theme and plugin.
  • Network setup and set up applications as part of our service (Tomcat etc)
  • Installing/configuring our 3rd party software
  • Troubleshooting/incident investigation
  • Monitoring and Capacity planning.
  • Installing a configuring monitoring software to support others in their roles

Environment: Web development, Web Hosting, Configuration, WordPress/CMS, HTML, CSS, PHP, Google Analytics, Web deployment, Linux/Windows.

Confidential

Technical Lead

Responsibilities:

  • Worked on SQOOP to import CDR data from various relational data sources to Big Data.
  • Working with Flume in bringing click stream data from front facing application logs.
  • Worked on strategizing SQOOP jobs to parallelize data loads from source systems.
  • Participated in providing inputs for design of the ingestion patterns.
  • Participated in strategizing loads without impacting front facing applications.
  • Worked on design on Hive data store to store the data from various data sources.
  • Developed MapReduce and MRUnit jobs to operate on streaming data.
  • Involved in providing inputs to analyst team for functional testing.
  • Worked with source system load testing teams to perform loads while ingestion jobs are in progress.
  • Worked on performing data standardization using PIG scripts.
  • Worked on building analytical data stores for data science team’s model development.
  • Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
  • Worked on performance tuning of HIVE queries with partitioning and bucketing process.
  • Worked with Ambari UI to configure alerts for Hadoop eco system components.
  • Participated in tuning various components in Hadoop Eco System.

Environment: Hadoop 2.2 ( Confidential Big Insights) - Apache Sqoop 1.4.5, HDFS, Map Reduce, Hive 0.14.0, TEZ 0.5.2, PIG 0.14.0, Oozie 4.1.0, HBase 0.98.4, Ambari 2.0.0 JUnit, Zookeeper, maven, Hadoop Data Lake with Linux-Cent OS, Oracle 11g, SQL Developer 4.1.3, Unix/Shell Scripting.

Confidential

Application Developer

Responsibilities:

  • Implemented CDR batch processing for MSC, IN, ADSL, and Confidential Streams to object oriented Programming.
  • Involved in the review and analysis of the Functional Specifications, and Requirements Clarification Defects etc.
  • Involved in the analysis and design of the initiatives.
  • Involved in the development of the mediation platform using Java.
  • Involved in design and implementation of migration of Mediation legacy platform to mediation zone.
  • Involved in writing Junit Test for Unit testing.
  • Involved in Regression Testing of SAT and Development Environments.
  • Involved in writing the SQL queries and stored procedures.
  • Participated in the test case reviews, and manual testing of the enhancements during Release 1.5.
  • Involved in fixing the defects during integration testing.
  • Build and deployment of the application using Ant script on to dev and testing environments.
  • Participated in the code reviews for various initiatives, Performed Static Code Analysis to follow the Best Practices for Performance and Security.

Environment: Mediation Zone (Digital Route), Legacy Zone, Core Java 6

Confidential

Application Developer & Tester

Responsibilities:

  • Analyzing the High Level Design (HLD) and Business Requirements.
  • Providing daily updates to the on-site team over call and making enhancements.
  • Implemented SQL and PL/SQL scripts including Stored Procedures, functions, packages and triggers.
  • Interpreting product requirements into test requirements, writing test plans and test cases.
  • Made enhancements to the application which presented me with the opportunity to go through the entire SDLC.
  • Preparing the Test Cases and Test Execution Plan.
  • Code coverage and Test case presentation.
  • Involved in Test Planning and Test case execution.
  • Preparing TPI in the test planning phase and the execution phase.
  • Raised defects in QC.
  • Reviewing the test plan document, so that there is no functionality or the requirements missing.
  • Used Web Services like SOAP and WSDL to communicate over internet.
  • Automated the tests using the integration tool (SOAP-UI).

Environment: WebMethods, SOAP-UI, SEIBEL, CITRIX, Putty, Confidential Rational - ClearQuest.

We'd love your feedback!