Sr. Big Data Consultant Resume
Basking Ridge, NJ
SUMMARY
- Around 7.5 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
- Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop and Hive for scalability, distributed computing and high performance computing.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and Hbase
- Experienced in Integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Experience in using Hive Query Language for data Analytics.
- Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
- Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Expertise on Scala Programming language and Spark Core
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift.
- Analyze data, interpret results and convey findings in a concise and professional manner
- Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Knowledge of MS SQL Server2012/2008/2005and Oracle 11g/10g/9i and E-Business Suite.
- Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
- Developed Web-Services module for integration using SOAP and REST.
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
- Knowledge of java virtual machines (JVM) and multithreaded processing.
- Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Struts, JavaScript, Servlets.
- Java Developer with extensive experience on various Java Libraries, API's and frameworks.
- Hands on development experience with RDBMS, including writing complex SQL queries, Stored procedure and triggers.
- Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
- Experience on using Talend ETL tool.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
- Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
- Strong analytical and Problem solving skills.
TECHNICAL SKILLS:
BigData Platforms: Apache Spark, Spark SQL, Spark Streaming, Amazon EMR, Red Shift, Cloudera, Big Data, Hadoop, Yarn, Map Reduce, PIG, HIVE, HBASE, Storm, Kafka, Impala, Mongo DB and Cassandra
Languages: JAVA, J2EE, JSP, Servlets, Spring MVC, Spring MVC Portlet, Struts
Databases: Oracle10g/9i/8i/8.0/7.0,MS SQL Server 6.5/7.0/2000/2003.
Tools and Products: Eclipse, Vignette Content Management systems, Documentum, ATG e Commerce and Team Connect.
Web: HTML, DHTML, JavaScript, JSP, XSL and XML.
Build Tools: Maven and Ant
Version Controls: Clear Case, StarTeam, Serena and SVN
Operating Systems: UNIX, Linux, Microsoft Windows 95/98/00/NT/XP, MS-DOS.
PROFESSIONAL EXPERIENCE:
Confidential, Basking Ridge, NJ
Sr. Big Data Consultant
Responsibilities:
- Ingested multiple sources into Hive warehouse tenant space for report generation.
- Worked on Hl7 Data and parsed the data using Spark Rdd and DataFrame api’s.
- Created Oozie workflow for automation and scheduling which run independently with time and dataavailability.
- Created the incremental framework with HBase control table and Entity Instance table.
- Involved in extracting the Transactional data from various policies of Confidential by writing the map reduce jobs and automating it with UNIX shell script.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Integrated spark with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive.
- Improved the job performance running on large data by using spark optimization technique’s.
- Efficiently used HBase with spark. Reading the Hbase data into Spark Rdd and performed the computations.
- Implemented Kafka messaging services to stream large data and insert into database.
- Using HBase to store majority of data which needs to be divided based on region.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Created the streaming pipeline with Rabbit MQ and service calls.
- Supported the automation testing team with integrating the spark with cucumber.
- Deployed the code with CICD tools like Genkins and GitHub.
- Streamed the HL7 messages to the rabbitMQ using the spark and scala.
- Ingested the huge volume of XML data into the lake with the incremental framework.
Environment: Hadoop, HBase, MapR, ORC, Map Reduce, RabbitMQ, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie, Java, SQL, Shell script.
Confidential, Atlanta, GA
Big Data Consultant
Responsibilities:
- Involved in requirement and design phase to implement DMF(Data movement flow) application to ingest data from many sources to hadoop.
- Developed export jobs for IDW data to export into Teradata for BI reports.
- Worked on AWS platform for Real stream data pipeline.
- Designed utility jobs to move data into Amazon Redshift from Hortonworks in-house platform.
- Used Spark DataFrame API to process Structured and Semi Structured files and load them back into S3 Bucket.
- Ingested huge amount of JSON files into Hadoop with in Spark jobs. Extracted Daily Sales, Hourly Sales and Product Mix of offers and loaded them into Global Data Warehouse.
- Used Oozie to automate the data loading into Hadoop Distributed File System and Control-M for job scheduling.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
- Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated using PIG then the processed data is stored into Hive warehouse which enabled business analysts to get the required data from Hive.
- Developed Hive queries to join click stream data with the relational data for determining the interaction of search guests on the website
Environment: Spark, Spark SQL, Kafka, Active MQ, Hadoop, Hortonworks, ORC, Parquet, Map Reduce, Storm, HDFS, Hive, Sqoop, Oozie, Scala, Shell script.
Confidential, Basking Ridge, NJ
Sr. Hadoop/Spark Developer.
Responsibilities:
- Involved in Design, implement and maintain applications that receives a transaction-based and Product mix data generated from the insurance policies.
- Job duties involved the design, development of various modules in Hadoop Big Data Platform and processing data using Spark Streaming, SparkSQL, Map Reduce, Hive, Pig, Scoop and Talend.
- Design, developed and tested Spark Application named ECI Builder which used over many applications Automated with Shell Script and scheduled using the Talend Tac.
- Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Dataframes API to load structured and semi structured data into Spark Clusters
- Involved in developing shell scripts and automated data management from end to end integration work
- Involved in extracting the Transactional data from various policies of Confidential by writing the map reduce jobs and automating it with UNIX shell script.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Using Pig Scripts, transformed and loaded data into HBase tables.
- Involved in coordinating and part of the client meetings for clarity of the requirements to ingest the Customers data for Various policies.
- Worked on ORC hive tables and MapR Environment.
Environment: Hadoop, HBase,MapR,ORC,Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie, Java, SQL, Shell script.
Confidential . Austin, TX
Sr. Big Data Developer.
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Javamap-reduceHive, Pig, and Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Migrated Map reduce jobs to Spark Jobs to achieve better performance.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data even joins and some pre-aggregations before storing data into HDFS.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
- Migrated Map reduce jobs to Spark Jobs to achieve better performance
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Developed Scala & Python scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark1.3+ for Data Aggregation, queries and writing data back to OLTP system directly or through Sqoop.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop,Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie,Java, SQL, Shell script.
Confidential, New York, NY
Java/Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS and developed multiple Map Reduce jobs in Java for data cleansing and preprocessing.
- Data back up and synchronization using Amazon Web Services.
- Designed utility jobs to move data into Amazon Redshift from Hortonworks in-house platform
- Importing and exporting data into HDFS and Hive using Sqoop.
- Configured Flume to transport web server logs into HDFS
- Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Worked on Amazon Web Services as the primary cloud platform
- Using Packer, Terraform and Ansible, migrate legacy and monolithic systems to Amazon Web Services.
- Load and transform large sets of structured, semi structured and unstructured data
- Supported Map Reduce Programs those are running on the cluster
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Worked on loading of data from several flat files sources to Staging using Teradata Multiload, FastLoad.
Environment: Hadoop, Map Reduce, HDFS, Hive, Apache Spark, Kafka, CouchDB, Flume, AWS, Cassandra, Java, Struts, Servlets, HTML, XML, SQL, J2EE, MRUnit, JUnit, JDBC, SQL, XML, Eclipse.
Confidential
Software Engineer
Responsibilities:
- Involved in designing of shares and cash modules using UML.
- Effectively used the iterative waterfall model software development methodology during this time constraint project.
- Used HTML and JSP for the web pages and used JavaScript for Client side validation.
- Created XML pages with DTD’sfor front-end functionality and information exchange.
- Responsible for writing Java SAX parsers programs.
- Developed ANT build scripts to build and deploy application in enterprise archive format (.ear)
- Performed Unit testing using JUnit and Functional Testing.
- Used the Json response format to retrieve data from web servers.
- Used JDBC 2.0 extensively and was involved in writing several SQL queries for the data retrieval.
- Prepared program specifications for the loans module and involved in database designing.
Environment:Java, J2EE, EJB 2.0, Servlets, JavaScript, OO, JSP, JNDI, Java Beans, Web Logic, XML, XSL, Eclipse, PL/SQL, Oracle 8i, HTML, DHTML, UML.