We provide IT Staff Augmentation Services!

Sr. Hadoop Consultant Resume

5.00/5 (Submit Your Rating)

Baltimore, MD

SUMMARY

  • Overall 8 years of experience in Analysis, design, development, testing &deployment in IT industry.
  • Knowledge with all phases of software development life cycle (SDLC).
  • 5 years of experience installing, configuring, and testing Hadoop ecosystem components.
  • Experienced in working with Pig, Hive, Sqoop and Map Reduce.
  • Extensive experience with Hbase and flume.
  • Worked on Spark - Streaming using Scala.
  • Worked on PySpark module
  • Have knowledge on Jenkins and Git tools.
  • Worked on Tableau dashboard
  • Integrated sql with Tableau
  • Good experience working with Hortonworks Distribution, Map R andClouderaDistribution.
  • Created a 360 degree view of customer data for a financial client in a Hadoop data lake
  • Implemented Hadoop muti node cluster on a AWS storage
  • Worked on different ETL processes for the data ingestion module.
  • Extensive experience with ETL technologies, such as Informatica.
  • Experience in end to end design, development, Maintainance and Analysis of various types of applications using efficient Data ScienceMethologies and Hadoop ecosystem tools.
  • Experience in providing Solution Architecture for Big Data projects usingHadoopEco System.
  • Experienced in setting up ofHadoopcluster, Performance Tuning, Developing Logical & Physical Data Models using HIVE for Analytics, Data lake creation using Hive and Data load management using SQOOP.
  • Implemented Hive workflows within Cascading flows and Cascades.
  • Performed data processing operations in Scala with Scalding.
  • Have developed Cascading applications on Hadoop that integrate with Teradata.
  • Experience in Linux shell Scripting.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop Information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Database / ETL Performance Tuning: Broad Experience in Database Development including effective use of Database objects, SQL Trace, Explain Plan, Different types of Optimizers, Hints, Indexes, Table Partitions, Sub Partitions, Materialized Views, Global Temporary tables, Autonomous Transitions, Bulk Binds, Capabilities of using MS SQL Built-in Functions. Coding of Database objects like Triggers, Procedures, Functions and Views.
  • Performance Tuning of Informatica Mapping and workflow.
  • Exposure to T-SQLprogramming and Architecture, and translated complex legacy process with T-SQLprocedure, functions and package.
  • Knowledge of working with star schema & Snow-Flake Schema.
  • Excellent interpersonal and strong analytical, problem-solving skills with customer oriented attitude.
  • A very good team player, self-motivated, dedicated in any work environment.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Spark, Scala, Cascading, Zookeeper, Hive, Pig, Sqoop, Flume and Pivotal HD.

Programming Languages: Java J2EE, Python, Scala, C, R.

Scripting Languages: JSP & Servlets, JavaScript, XML, HTML and Bash.

Databases: MS SQL, Oracle, Vertica, NoSQL, MongoDB.

IDEs& Tools: Rational Rose, Rational Team Convert, Eclipse, NetBeans, Eclipse, JUnit, jQuery, MQ, TOAD, SQL Developer, Microsoft Visual studio 2008/10, Yum, RPM.

Versioning Tools: SVN, CVS, Dimensions, and MS Team Foundation Server

Scripts &Libraries: Java Script, AngularJs, Node.js, Freemarker, Groovy, Maven, Ant Scripts,XML DTDs, Xquery, XPath, XSLT, XSDs, JAXP, SAXandJDOM.

MarkupLanguages: XSLT, XML, XSL, HTML5, DHTML, CSS, OO CSS, jQuery, AJAX.

Operating Systems: Red Hat Linux 6.2/6.3, Unix, Solaris, Windows 7/8, Linux.

PROFESSIONAL EXPERIENCE

Confidential, Baltimore, MD

Sr. Hadoop Consultant

Responsibilities:

  • Experience in providing Solution Architecture for Big Data projects usingHadoopEco System.
  • Experienced in setting up ofHadoopcluster, Performance Tuning, Developing Logical & Physical Data
  • Models using HIVE for Analytics, File processing using PIG, and Data load management using SQOOP
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Writing Pig scripts to process the data
  • Have worked extensively on HIVE
  • Expert in implementing the in-memory computing capabilities like Apache Spark written in Scala
  • Processed Real time data using SPARK STREAMING.
  • Used SPARK SQL to query directly on the Spark Streaming data
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Have performed the plugging-in operation for Elasticsearch into Cascading flows.
  • Developed multiple POCs using Spark-Streaming and deployed on the Yarn cluster, compared the performance of Spark, with Storm
  • Experience in using ClouderaManager.
  • Continuous monitoring and managing the Hadoop cluster throughMAP R.
  • Have done POC on Greenplum with Spark- Streaming
  • Worked on Tableau dashboard
  • Integrated sql with Tableau
  • Applied business concepts to design and maintenance an internal database for Advanced Analytics group with MySQL andmemSQL, including database backup, restoring, and optimization.
  • Used ETL to extract files for the external vendors and coordinated that effort
  • UsedChange Data Capture (CDC) to simplify ETL in data warehouse applications
  • Written Hive queries for data analysis to meet the Business requirements.
  • Developed and executed shell scripts to automate the jobs
  • Supported MapReduce Programs those are running on the cluster.
  • Experienced in defining job flows.
  • Worked on Hbase where data from HDFS was moved into HBase for analysis.
  • Experienced in managing and reviewingHadooplog files, Supported MapReduce Programs running on the cluster.
  • Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring)
  • Moved various formats like TSV, CSV etc. files from RDBMS to HDFS for furtherProcessing.
  • Gathered the business requirements by coordinating and communicating with business team.
  • Prepared the documents for the mapping design and production support.
  • Written the Apache PIG scripts to process the HDFS data and send the data to HBase.
  • Involved in developing the Hive Reports, Partitions of Hive tables.
  • Moved user information from MS SQL Server to HBase using Sqoop.
  • Involved in integration of Hive and HBase.
  • Created MapReduce jobs using Hive/Pig Queries.

Environment: Linux, Map R 5.0, Eclipse, ElasticSearch, Cascading, Kerberos, Ranger, HDFS, Pig, Hive, Sqoop, Flume, Java, JEE,Python,Spark, HBase, SQL Server 2014.

Confidential, New York city, NY

Hadoop Consultant

Responsibilities:

  • Performed root cause analysis in the client data warehouse.
  • Maintained client relationship by communicating the daily status and weekly status of the project.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Coded in Python with Numpy, Scipy, and Pandas modules. Performed statistical analysis on different data patterns to recommend top valued data patterns with the highest accuracy in predicting the next data point. Used Python Matplotlib and Excel to generate charts, and report to customer using PowerPoint in weekly meetings
  • Worked on a POC which built custom Python scripts for pre, post load processes and control processes while loading data on Vertica from Hadoop
  • Used MADLIB and MLIB packages in Spark to train and model the incoming realtime data along with the historical data.
  • Implemented the partitioning strategy at the table level to streamline the load process on the Vertica database
  • Participated in design review for migrating the platform from MS SSAS to Vertica/ROLAP. Refactored multiple ETL approaches into a single modular one, refine security scheme to enable multiple layer of defense.
  • Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart
  • Integrated different data sources using hive to create a single large table in data lake
  • Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart
  • Used ETL to extract files for the external vendors and coordinated that effort
  • UsedChange Data Capture (CDC) to simplify ETL in data warehouse applications
  • Create solution combining Vertica and Hadoop for clickstream data.
  • Created Python scripts to conduct routine maintenance and deliver ad hoc reports. Monitored and tuned user-developed JavaScript
  • Facilitated storage by identifying the need and subsequently developing JavaScript to archive GridFS collections.
  • Proactively developed and implemented a Python script to report the health and metadata of a shard cluster.
  • Design & implement sharding and indexing strategies
  • Monitor deployments for capacity and performance
  • Define and implement backup strategies per data retention requirements
  • Develop and document best practices for data migration
  • Incident and problem management
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing
  • Handling structured and unstructured data and applying ETL processes.
  • Managing and reviewingHadooplog files.
  • RunningHadoopstreaming jobs to process terabytes data.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Proactively monitored systems and services, architecture design and implementation of Hadoopdeployment, configuration management, backup, and disaster recovery systems and procedures
  • Performed visualization using sqlintergrated with Tableau on different input data
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Load and transform large sets of structured, semi structured and unstructured data
  • Supported Map Reduce Programs those are running on the cluster
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure condition
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs
  • Utilized Java and MS SQL from day to day to debug and fix issues with client processes
  • Managed and reviewed log files
  • Implemented partitioning, dynamic partitions and buckets in HIVE

Environment: Hadoop, Cloudera 5.3.0, ETL, Eclipse, R programming,Python, Map Reduce, Hive, Pig, Hbase, Putty, Sqoop, Flume, Scala,Spark, Linux, Java, Tableau, Eclipse, HDFS, PIG, Java (JDK), MSSQL, Vertica and CENTOS.

Confidential, Philadelphia, PA

Hadoop Consultant

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Coordinated with business customers to gather business requirements, also interacted with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables loading data and writing hive queries that will run internally in MapReduce way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
  • Load and transform large sets of structured and semi structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Hbase.

Confidential, Phoenix, AZ

MS SQL/ETL Consultant

Responsibilities:

  • Worked as developer and administratoronMSSQLServer.
  • Maintained client relationship by communicating the daily status and weekly status of the project.
  • Developed complex T-SQLcode.
  • Created Database Objects - Tables, Indexes, Views, User defined functions, Cursors, Triggers, Stored Procedure, Constraints and Roles.
  • UsedSQLprofiler to view indexes performance to mostly eliminate table scan.
  • Maintained the table performance by following the tuning tips like normalization, creating indexes and collect statistics.
  • Managed and monitored the use of disk space.
  • Maintained the consistency of the client's Database using DBCC.
  • Create indexes on selective columns to speed up queries and analysis inSQLServerManagement Studio
  • Implemented triggers and stored procedures and enforced business rules via checks and constraints.
  • Performed data transfers using BCP and BULK INSERT utilities.
  • Implemented Mirroring and Log Shipping for Disaster recovery.
  • Executed transactional and snapshot replication.
  • Performed all aspects of database administration, including data modeling, backups and recovery.
  • Experience in troubleshooting replication problems.
  • Checking Database Health by using DBCC Commands and DMVS.
  • Generatedserverside T-SQLscripts for data manipulation and validation and created various Snapshots and materialized views for remote instances.
  • Tuned stored procedures by adding the try catch blocks for error handling.
  • Tested to optimize the Stored Procedures and Triggers to be used in production.
  • Worked with Data Modeler and DBAs to build the data model and table structures. Actively participated in discussion sessions to design the ETL job flow.
  • Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document the changes.

Environment: MSSQLServer, Business Intelligence Development Studio (BIDS),SQLserver integration services (SSIS),SQLserver reporting services (SSRS),SQLScripts, Linux Script, Unix Operating System, Windows Operating System, T-SQL, ETL, Stored Procedures, MS Access, MS visio and MS Excel.

Confidential

Web Developer

Responsibilities:

  • Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements.
  • Created front-end interfaces and Interactive user experience using HTML, CSS, and JavaScript.
  • Responsible for validation of Client interface JSP pages using Struts form validations.
  • Ajax for better and more options in graphical and shaping page with JavaScript.
  • API and SOAP for transferring data and information between other websites.
  • Worked on Struts framework to create theWebapplication.
  • Developed Servlets, JSP and Java Beans using Eclipse.
  • Designed and developed struts action classes for the controller responsibility.
  • Involved in the integration of Spring for implementing Dependency Injection (DI/IOC).
  • Responsible for Writing POJO, Hibernate-mapping XML Files, HQL.
  • Involved with the database design and creating relational tables.
  • Utilized Agile Scrum to manage full life-cycle development of the project.
  • Building and Deployment of EAR, WAR, JAR files on test, stage and production servers.
  • Involved with the version control and configuration management using SVN.

Environment: HTML, CSS, XML, DHTML, XHTML, DOM, POJO,SQL, SOAP, JSP, JavaScript, JQuery, AJAX, JSON, Eclipse.

We'd love your feedback!