Senior Big Data Consultant Resume Bridge Water, NJ - Hire IT People

PROFESSIONAL SUMMARY:

Around 10 years of IT experience in various domains with Big Data Eco Systems, Core java and SQL&PL/SQL Technologies with hands - on project experience in various Verticals, which includes financial services and trade compliance.
Extensive hands-on experience in Spark Core, Spark-Sql, Spark Streaming and Spark machine learning using Scala and Python programming language.
Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimising Broadcasts.
In depth understanding of Apache, spark job execution Components like DAG, lineage graph, DagScheduler, Task scheduler, Stages and task.
Experience in exposing Apache Spark as web services.
Good understanding of Driver, Executor Spark web UI.
Experience in submitting Apache Spark job and map reduce jobs to YARN.
Experience in real time processing using Apache Spark and Flume, Kafka.
Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning. Hands on experience in working with input file formats like orc, parquet, json, avro.
Good expertise in coding in Python, Scala and Java.
Good understanding of the map reduces framework architectures (MRV1 & YARN Architecture).
Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
Developed various Map Reduce applications to perform ETL workloads on Meta data and terabytes of data.
Hands on experience in cleansing semi-structured and unstructured data using Pig Latin scripts
Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
Experience in managing and reviewing Hadoop log files.
Having good working experience of No SQL database like Hbase, Cassandra and Mango DB
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems / mainframe and vice-versa.
Experience in working with flume to load the log data from multiple sources directly into HDFS
Experience in scheduling time driven and data driven Oozie workflows.
Used Zookeeper on a distributed Hbase for cluster configuration and management.
Worked with Avro Data Serialization system.
Experience in fine-tuning Map reduces jobs for better scalability and performance.
Experience in writing shell scripts do dump the shared data from landing zones to HDFS.
Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
Expertise in Client Side designing and validations using HTML and Java Script.
Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.

TECHNICAL SKILLS:

Big Data Frameworks: Hadoop, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB, Spark, Scala.

Big data distribution: Cloudera, Amazon EMR

Programming languages: Oracle PL/SQL,Core Java, Scala, Python, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle10g,Mysql, Netezza, Sql Server, Tera Data, Postgres

Designing Tools: Eclipse, PL/SQL Developer, Toad, Putty

Development methodologies: Agile, Waterfall

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: PVCS, SVN and CVS, Git

Analytics: Tableau, SPSS, SAS EM and SAS JMP

PROFESSIONAL EXPERIENCE:

Senior Big Data Consultant

Confidential, Bridge Water, NJ

Responsibilities:

Analyze business requirements including problems reported as part of maintenance of applications and work closely with project team members, technical leads and business partners to arrive at an optimal solution design
Created Snappy data tables like external tables and internal tables as per the business requirements.
Troubleshooting Error messages provide information about problems that might occur when setting up the Snappy Data cluster or when running queries
Involved in defect fixing in snappy data.
Manage and monitor Snappy data cluster.
Involved in data validation for Tableau reports generation.
Build scalable framework using snappy data advanced framework.

Environment: Linux, Hadoop, Spark core, Snappy data, Tableau

Big Data Senior Consultant

Confidential, Ashburn, VA

Responsibilities:

Write spark jobs to read data into a data frame and apply various transformations and actions to filter and transform data into the required format
Build scalable framework using spark’s advanced framework.
Write spark jobs to write final data into HDFS and RDBMS.
Manage and monitor Hadoop cluster.
Manage ETL team and motivate/educate them to learn and work efficiently
Involved Hive queries into Spark SQL to improve performance.
Executed Spark RDD transformations and actions as per business analysis needs
Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
Fully automated job scheduling, monitoring, and cluster management without human.
Used Sqoop to import and export data among HDFS, MySQL database and Hive.

Environment: Linux, Hadoop, Spark core, Spark SQL, Scala, Hive.

Big Data Consultant

Confidential

Responsibilities:

Load and transform large sets of structured, semi structured and unstructured data coming from different source systems and a variety of portfolios
Used Spark data frame to read text data, CSV data, and image data from HDFS, S3 and Hive.
Worked closely data scientist for building predictive model using Spark.
Cleaned input text data using Spark Machine learning feature exactions API.
Migrated Hive queries into Spark SQL to improve performance.
Involving in Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
Trained model using historical data stored in HDFS and Amazon S3.
Used Spark Streaming to load the trained model to predict on real time data from Kafka.
Executed Spark RDD transformations and actions as per business analysis needs
Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
Fully automated job scheduling, monitoring, and cluster management without human.
Created Hive tables and involved in meta data loading and writing Hive UDFs
Used Sqoop to import and export data among HDFS, MySQL database and Hive.
Migrated python scikit learn machine learning to data frame based spark machine learning algorithms.

Environment: Spark core, SparkSQL, Spark streaming, Spark machine learning, Scala, Data frames, Datasets, AWS, Kafka Hive, Sqoop, Hbase, Github, Webflow, Amazon s3, Amazon EMR.

Big Data Associate Consultant

Confidential

Responsibilities:

Created various Map reduce jobs for performing ETL transformations on the transactional and application specific data sources.
Imported data from our relational data stores to Hadoop using Sqoop
Wrote PIG scripts and executed by using Grunt shell.
Worked on the conversion of existing Map Reduce batch applications for better performance.
Big data analysis using Pig and User defined functions (UDF).
Worked on loading tables to Impala for faster retrieval using different file formats.
The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
Created Reports and Dashboards using structured and unstructured data.
Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
Performed joins, group by and other operations in Map Reduce by using Java and PIG.
Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
Used HIVE definition to map the output file to tables.
Setup and benchmarked Hadoop/HBase clusters for internal use
Wrote data ingesters and map reduce programs
Reviewed the HDFS usage and system design for future scalability and fault-tolerance
Wrote MapReduce/HBase jobs
Worked with HBase, NOSQL database.

Environment: ApacheHadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL.

Big Data Associate Consultant

Confidential

Responsibilities:

Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
Installed and configured Apache Hadoop, Hive, and HBase.
Worked on Hortonworks cluster, which was used to process the big data.
Developed multiple map reduce jobs in java for data cleaning and pre-processing.
Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
Defined workflows using Oozie.
Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
Created Data model for Hive tables
Good Experience in managing and reviewing Hadoop log files
Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
Worked on large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources
Installed and configured Hive and also developed Hive UDFs to extend core functionality of hive
Responsible for loading data from UNIX file systems to HDFS.

Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, HBase, Pig, Oozie, Unix, Java 7, Eclipse.

Associate Consultant

Confidential

Responsibilities:

Full life cycle experience including requirements analysis, high level design, detailed design, data model design, coding, testing and creation of functional and technical design documentation.
Extensively involved in writing stored procedures, functions, packages as per the business requirements.
Redesigned existing procedures and packages to enhance the performance.
Debugging Pro*C and PL/SQL code block of stored procedures.
Generation of ad-hoc reports using SQL and stored procedures.
Involved in the continuous enhancements and fixing of production problems.
Analysis of CRs those are raised from UAT & Production.
Coordinating with the UAT & Production team as well as with the users.
Used Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
Wrote SQL, PL/SQL, SQL*Plus programs required to retrieve data using cursors and exception handling
Involved in SIT and UAT Support for solving critical issues.
Involved in requirements, Design phases, Coding, Testing of the functionality.
Creating and Maintaining Database objects.
End to end functional testing for entire application

Environment: s: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris.

Software Engineer

Confidential

Responsibilities:

Involved in writing stored procedures, functions, packages as per the business requirements.
Developed Pro*c programs for flat file generation.
Worked on Request for Changes (RFC) and Production Problem Resolutions (PPR).
Provided support across the various phases of the project.
Prepared and executed unit test cases.

Environment: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris

We provide IT Staff Augmentation Services!

Senior Big Data Consultant Resume

Bridge Water, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship