We provide IT Staff Augmentation Services!

Sr. Big Data Consultant. Resume

2.00 Rating

Basking Ridge, NJ

SUMMARY

  • Around 7.5 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
  • Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop and Hive for scalability, distributed computing and high performance computing.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and Hbase
  • Experienced in Integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Experience in using Hive Query Language for data Analytics.
  • Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
  • Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Expertise on Scala Programming language and Spark Core
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift.
  • Analyze data, interpret results and convey findings in a concise and professional manner
  • Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Knowledge of MS SQL Server2012/2008/2005and Oracle 11g/10g/9i and E-Business Suite.
  • Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
  • Developed Web-Services module for integration using SOAP and REST.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
  • Knowledge of java virtual machines (JVM) and multithreaded processing.
  • Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Struts, JavaScript, Servlets.
  • Java Developer with extensive experience on various Java Libraries, API's and frameworks.
  • Hands on development experience with RDBMS, including writing complex SQL queries, Stored procedure and triggers.
  • Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
  • Experience on using Talend ETL tool.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
  • Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
  • Strong analytical and Problem solving skills.

TECHNICAL SKILLS:

BigData Platforms: Apache Spark, Spark SQL, Spark Streaming, Amazon EMR, Red Shift, Cloudera, Big Data, Hadoop, Yarn, Map Reduce, PIG, HIVE, HBASE, Storm, Kafka, Impala, Mongo DB and Cassandra

Languages: JAVA, J2EE, JSP, Servlets, Spring MVC, Spring MVC Portlet, Struts

Databases: Oracle10g/9i/8i/8.0/7.0,MS SQL Server 6.5/7.0/2000/2003.

Tools and Products: Eclipse, Vignette Content Management systems, Documentum, ATG e Commerce and Team Connect.

Web: HTML, DHTML, JavaScript, JSP, XSL and XML.

Build Tools: Maven and Ant

Version Controls: Clear Case, StarTeam, Serena and SVN

Operating Systems: UNIX, Linux, Microsoft Windows 95/98/00/NT/XP, MS-DOS.

PROFESSIONAL EXPERIENCE:

Confidential, Basking Ridge, NJ

Sr. Big Data Consultant.

Responsibilities:

  • Ingested multiple sources into Hive warehouse tenant space for report generation.
  • Worked on Hl7 Data and parsed the data using Spark Rdd and DataFrame api’s.
  • Created Oozie workflow for automation and scheduling which run independently with time and dataavailability.
  • Created the incremental framework with HBase control table and Entity Instance table.
  • Involved in extracting the Transactional data from various policies of Confidential by writing the map reduce jobs and automating it with UNIX shell script.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Integrated spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive.
  • Improved the job performance running on large data by using spark optimization technique’s.
  • Efficiently used HBase with spark. Reading the Hbase data into Spark Rdd and performed the computations.
  • Implemented Kafka messaging services to stream large data and insert into database.
  • Using HBase to store majority of data which needs to be divided based on region.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Created the streaming pipeline with Rabbit MQ and service calls.
  • Supported the automation testing team with integrating the spark with cucumber.
  • Deployed the code with CICD tools like Genkins and GitHub.
  • Streamed the HL7 messages to the rabbitMQ using the spark and scala.
  • Ingested the huge volume of XML data into the lake with the incremental framework.

Environment: Hadoop, HBase, MapR, ORC, Map Reduce, RabbitMQ, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie, Java, SQL, Shell script.

Confidential, Atlanta, GA

Big Data Consultant

Responsibilities:

  • Involved in requirement and design phase to implement DMF(Data movement flow) application to ingest data from many sources to hadoop.
  • Developed export jobs for IDW data to export into Teradata for BI reports.
  • Worked on AWS platform for Real stream data pipeline.
  • Designed utility jobs to move data into Amazon Redshift from Hortonworks in-house platform.
  • Used Spark DataFrame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Ingested huge amount of JSON files into Hadoop with in Spark jobs. Extracted Daily Sales, Hourly Sales and Product Mix of offers and loaded them into Global Data Warehouse.
  • Used Oozie to automate the data loading into Hadoop Distributed File System and Control-M for job scheduling.
  • Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
  • Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated using PIG then the processed data is stored into Hive warehouse which enabled business analysts to get the required data from Hive.
  • Developed Hive queries to join click stream data with the relational data for determining the interaction of search guests on the website

Environment: Spark, Spark SQL, Kafka, Active MQ, Hadoop, Hortonworks, ORC, Parquet, Map Reduce, Storm, HDFS, Hive, Sqoop, Oozie, Scala, Shell script.

Confidential, Basking Ridge, NJ

Sr. Hadoop/Spark Developer.

Responsibilities:

  • Involved in Design, implement and maintain applications that receives a transaction-based and Product mix data generated from the insurance policies.
  • Job duties involved the design, development of various modules in Hadoop Big Data Platform and processing data using Spark Streaming, SparkSQL, Map Reduce, Hive, Pig, Scoop and Talend.
  • Design, developed and tested Spark Application named ECI Builder which used over many applications Automated with Shell Script and scheduled using the Talend Tac.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Dataframes API to load structured and semi structured data into Spark Clusters
  • Involved in developing shell scripts and automated data management from end to end integration work
  • Involved in extracting the Transactional data from various policies of Confidential by writing the map reduce jobs and automating it with UNIX shell script.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Using Pig Scripts, transformed and loaded data into HBase tables.
  • Involved in coordinating and part of the client meetings for clarity of the requirements to ingest the Customers data for Various policies.
  • Worked on ORC hive tables and MapR Environment.

Environment: Hadoop, HBase,MapR,ORC,Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie, Java, SQL, Shell script.

Confidential, Austin, TX

Sr. Big Data Developer.

Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Javamap-reduceHive, Pig, and Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
  • Migrated Map reduce jobs to Spark Jobs to achieve better performance.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data even joins and some pre-aggregations before storing data into HDFS.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Migrated Map reduce jobs to Spark Jobs to achieve better performance
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Developed Scala & Python scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark1.3+ for Data Aggregation, queries and writing data back to OLTP system directly or through Sqoop.
  • Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop,Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie,Java, SQL, Shell script.

Confidential, New York, NY

Java/Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS and developed multiple Map Reduce jobs in Java for data cleansing and preprocessing.
  • Data back up and synchronization using Amazon Web Services.
  • Designed utility jobs to move data into Amazon Redshift from Hortonworks in-house platform
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Configured Flume to transport web server logs into HDFS
  • Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Worked on Amazon Web Services as the primary cloud platform
  • Using Packer, Terraform and Ansible, migrate legacy and monolithic systems to Amazon Web Services.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Supported Map Reduce Programs those are running on the cluster
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations
  • Worked on loading of data from several flat files sources to Staging using Teradata Multiload, FastLoad.

Environment: Hadoop, Map Reduce, HDFS, Hive, Apache Spark, Kafka, CouchDB, Flume, AWS, Cassandra, Java, Struts, Servlets, HTML, XML, SQL, J2EE, MRUnit, JUnit, JDBC, SQL, XML, Eclipse.

Confidential

Software Engineer

Responsibilities:

  • Involved in designing of shares and cash modules using UML.
  • Effectively used the iterative waterfall model software development methodology during this time constraint project.
  • Used HTML and JSP for the web pages and used JavaScript for Client side validation.
  • Created XML pages with DTD’sfor front-end functionality and information exchange.
  • Responsible for writing Java SAX parsers programs.
  • Developed ANT build scripts to build and deploy application in enterprise archive format (.ear)
  • Performed Unit testing using JUnit and Functional Testing.
  • Used the Json response format to retrieve data from web servers.
  • Used JDBC 2.0 extensively and was involved in writing several SQL queries for the data retrieval.
  • Prepared program specifications for the loans module and involved in database designing.

Environment:Java, J2EE, EJB 2.0, Servlets, JavaScript, OO, JSP, JNDI, Java Beans, Web Logic, XML, XSL, Eclipse, PL/SQL, Oracle 8i, HTML, DHTML, UML.

We'd love your feedback!