Big data specialist / Big data Lead Resume

SUMMARY:

8+ years of total IT experience this includes 4.3 years’ of experience in Hadoop and Big data and more than 3.0 years’ experience with SAP BI/BO and SAP HANA and 1 year experience with SQL/PLSQL.
Involved in the Software Development Life Cycle (SDLC) phases which include Analysis, Design, Implementation, Testing and Maintenance.
Strong technical, administration, and mentoring knowledge in ETL and Big Data/Hadoop technologies.
Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Sqoop, Oozie, Kyvos, Apache Spark, YARN, Tez Apache Ranger.
Work experience with cloud infrastructure like Amazon Web Services (AWS) with EC2 and EBS volumes and done POC on S3 and Amazon EMR.
Imported data from various relational data stores to HDFS using Sqoop and Attunity to build Hadoop Data Lake (HDFS) and vice - versa.
Created Tableau Dashboards to monitor Data lake health with respective KIPs.
Hands on experience in using SparkSQL and MLlibs.
Experience in Hive quires execution on Spark engine and Tez.
Experience in Spark with Scala, implements RDD’s and Data Frames and scheduled them using Oozie to automate process.
Experienced with storing resultant data set from Spark RDDs to Hive table using HiveContext.
Installing, configuring and managing of Hadoop Clusters and Data Science tools.
Managing the Hadoop distribution with Apache Ambari and Hue.
Manage security policies consistently across Hadoop components using Apache Ranger.
Experience in developing Shell scripts system management.
Experience in scheduling time driven and data driven Oozie workflows.
Experience in designing and executing Unix Scripts to implement cron jobs that execute Hadoop jobs (MapReduce, Pig, etc.)
Exposure to NoSQL databases like Dynamo DB, Mongo DB and Hbase etc.
Familiar with various data warehouse and data modeling techniques
Hands on experience in developing programs in Scala language.
Hands on experience working with XML, JSON and ORCFile formats.
Hands on experience in designing database, defining entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL.
Worked on Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
Exposure on Apache Kafka which aims to provide a unified, high-throughput, low-latency platform for handling real-time data feed.
Hands on experience with developing highly complex Web-Intelligent reports, SAP Business Objects Analysis MS Office and Business Objects Xcelsius.
Worked on both implementation and production support projects.
SAP BI /HANA Modeling - Procedural knowledge with Info Cubes design, Info Object maintenance, DSO’s, Multi providers, Transformations, Data transfer Process, Info Packages,
SAP BI operations - Procedural knowledge Data load management, event processing, Scheduling Start Process and Process Chains, OLTP data extraction, Loading (Full/Delta Up-Load), Monitoring.
SAP BI Reporting - Business Explorer Analyzer, Query Designer, Analysis Office for Excel.
SAP BI Reporting - Well knowledge with Calculated KF’s, Restricted KF’s, Conditions & Customer Exit Variables
SAP ABAP - Start routine, End routine and Field routine
Worked with version control systems like Git, Stash, Bitbucket, Jira and Confluence for providing common platform for all the developers.

TECHNICAL SKILLS:

Hadoop/Big Data platform: HDFS, Map Reduce, Spark, Hive, Pig, Oozie, Sqoop, Kafka, Ambari, Ranger

Hadoop distribution: Hortonworks, Cloudera, Amazon Elastic MapReduce (EMR).

Programming languages: C/C++, Java, JDBC, JSP, UNIX shell scripts, Python, Pig Latin, PL/SQL, Scala, Java script, SAP ABAP and R.

Data Storage and Data Base and Data Warehouse: SAP BI/BW, SAP HAHA, Oracle 9i, 10g, 11g, MySQL, DB2, RDBMS, Dynamo DB,Mango DB, HBase

Reporting Tools: Web-Intelligent reports, SAP Business Objects Analysis MS Office and Business Objects Xcelsius, Tableau.

Cloud: Amazon AWS (EC2, EBS, S3, Dynamo DB)

PROFESSIONAL EXPERIENCE:

Confidential

Big data specialist / Big data Lead

Environment: HDP 2.0, HDP 2.5, Sqoop, HDFS, Hive, Hive UDF, Shell Script, Oozie, Kyvos, Attunity, Pig Latin, Apache Spark, SaprkSQL, YARN, Tez, Scala, Apache Ambari, Apache Ranger, Amazon AWS, Oracle DB,SQL, PL/SQL, Tableau, SAP BI and SAP HANA.

Responsibilities:

Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
Imported data from various relational data stores to HDFS using Sqoop and Attunity to build Hadoop data lake.
Worked on Verizon spend classification
Created Tableau Dashboards to monitor Data lake health with respective KIPs.
Built Tableau dashboard on hive table to get insights for company spend data.
Use Apache Sqoop to dump the user incremental data into the HDFS on a daily basis.
Worked on migration from SAP HANA/BI to Hadoop.
Experience in Spark with Scala, implements RDD’s and Data Frames and scheduled them using Oozie to automate process.
Experienced with storing resultant data set from Spark RDDs to Hive table using HiveContext.
Experienced in analyzing and Optimizing RDD’s by controlling partitions for the given data
Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting
Experienced in querying data using SparkSQL on top of Spark engine.
Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
Created Hive tables as per requirement as internal or external tables, intended for efficiency.
Worked on scheduling workflows in Oozie to automate and parallelize Hive and Pig jobs
Written customized Hive UDFs in Java where the functionality is too complex.
Worked on hive performance techniques like, partition, bucketing, vectorization, cost based query optimization and Tez ect.
Create and maintain Hive warehouse for Hive analysis.
Developed dynamic partitioned Hive tables to store data by date.
Analyzed large amounts of data sets by writing Pig Scripts.
Created Users, Groups and polices in Apache Ranger to provide security and accesses to the data.
Implement Row Level Security and Data Masking for hive table data using and ranger and custom logic.
Worked on Kyvos to store data into multidimensional data model for BI Reporting.
Exported data to Oracle table to get insights using Oracle Enceda.
Exported data other Hadoop environment for Machine Learning team using distcp commend.
Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in the cloud from enterprise applications and big data project.
Continuously monitored and managed Hadoop cluster using Apache Ambari.
Wrote shell scripts load data from local file system to HDFS and to build hive query structure to run hive shell.
Done POC on NoSQL DB AWS Dynamo DB and Mango DB.
Performed POC’s using latest technologies like Kafka and Amazon EMR.
Set up 12 node cluster for POCs on Amazon Web Services using EC2 and EBS
Worked with version control systems like Git, Stash, Bitbucket, Jira and Confluence.

Hadoop Developer

Confidential

Environment: HDP 2.0,Sqoop, HDFS, Hive, Hive UDF, Oozie, Apache Ambari, Ranger, Shell Script, Oracle DB, SQL,PL/SQL,SAP BO, SAP BI.

Responsibilities:

Used Sqoop to dump data from relational database into HDFS for processing and exporting data to RDMS Writing and reviewing technical design documents.
Loaded and transformed large sets of structured and semi structured.
Created Hive tables as per requirement as internal or external tables, intended for efficiency.
Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
Worked on hive performance techniques like, partition, bucketing, vectorization, cost based query optimization.
Worked on Hive UDFs to implement custom functions in Java.
Worked on performance tuning Hive and Pig queries.
Exported data to Oracle table to get insights using reporting tools.
Worked on different file formats and different Compression Codecs.
Worked on scheduling workflows in Oozie to automate and parallelize Hive jobs.
Exported data other Hadoop environment for Machine Learning team using distcp commend.
Continuously monitored and managed Hadoop cluster using Apache Ambari.
Wrote shell scripts load data from local file system to HDFS and to build hive query structure to run hive shell.

Senior SAP BI/IP/BO Developer

Confidential

Environment: SAP BI 7.3, BO 4.0, Oracle 10g, SQL, Pl/SQL, SAP ABAP, SAP Webi, SAP Bex.

Responsibilities:

Extracted data from 0CO OM CCA 9 data source cost center wise.
Developed Profitability Reports upon controlling data extracted using 0CO OM WBS6 & 0CO OM NWA2.
Modeling: creating Real time infocube, multi provider.
Created transformations, DTPs (for DSO).
Creating Infocubes, DSO’s and Multiproviders as per the Blue print phase.
Wrote Start routine, End routine and Field routine in SAP ABAP
Worked on various objects like RKF and CKF as per the client requirement
Developed various Aggregation Levels on top of Real time Info providers
Created various Filters for restricting the data at the planning modeler side
Defined the Planning Sequences for executing the planning functions
Developed Bex query in change mode by building them on Aggregation levels
Developed Bex query on top of multi provider which can ready input trough Analysis MS Office.
Involved in repot testing and bug fixing.
Designed process chains for weekly, monthly loads as per the business requirement.
Developed Generic extraction upon custom functional module to extract WBS partner details from IHPA &PRPS table with delta enabled using creation date & creation time.
Developed Gross Margin & Direct Margin Webi reports at AVGG & BUD exchange rates.
Extended all standard BO audit reports to Aricent landscape & developed custom reports.
Designed process chains for weekly, monthly loads as per the business requirement.
Involved in performing Production Support activities like Ticket Analysis (open/close) report.
Actively involved in Rectification of Load failure Errors like Master Data Loads, Transactional Loads.
Communicating to the business to identify the issue to resolve the issue.
Scripts for Inserting and deleting the data from Data base.
Writing the Oracle Stored Procedures, Functions to update data in sources for reports.
Understanding the client requirements and project functionalities

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship