We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Mouunt Laurel, Nj

SUMMARY

  • 6 years of IT experience in various domains with Hadoop Ecosystems and Java J2EE technologies.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
  • Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Experience in integrating relational databases and graph databases (Neo4j) and imported data from relational stores.
  • Strong technical, administration, and mentoring knowledge in Linux and Big Data/Hadoop technologies.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching),Accumulators, Broadcast Variables, Optimising Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG,lineage graph,Dag Scheduler, Taskscheduler, Stages and task.
  • With Cloudera Manager 5.0.1 or later and CDH 5.0.1 or later, the NFS gateway works on all operating systems
  • Migrated Python Machine learning modules to scalable,high performance and fault - tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs,Hive UDFs, Spark SQL Performance, Performance Tuning.Hands on experience in working with input file formats like orc, parquet, json, avro.
  • Extensively worked on Mainframe/Unix and Informatica environments to invoke Teradata Utilities and file handlings
  • Good expertise in coding in Python,Scala and Java.
  • AWS Redshift tables population using python programming.
  • Involved in writing test scripts using java and executed it through selenium cucumber .
  • Designed and developed applications using Spring MVC and Javascript and HTML.
  • Used Hibernate and Spring JDBC to connect to oracle database and retrieved data.
  • Good understanding of the map reduce framework architectures (MRV1 & YARN Architecture).
  • Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
  • Podium sqoop tool for ingestion.
  • To build a data lake on top of the Amazon's cloud storage platform i.e. S3.
  • Tools used: Spark v2.1.1, Scala v2.11.8, Sqoop v1.4.6, Oozie v4.3.0, UNIX shell scripting, Amazon EMR

    v5.7.0, AWS CloudWatch, Amazon S3, AWS Lambda, Bitbucket (for version control and code repository), and

    Jenkins (for code build and deployment).

  • Ingested the data into the Hive tables built on top of the Amazon S3 via SparkSQL and processed the same for further usage.
  • Cleaning up of the older partitions, temporary tables and the archival of the raw source files were handled by

    UNIX shell scripts.

  • Handled importing of data from various data sources, performed transformations using Map Reduce, Spark and loaded data into HDFS.
  • Expertise in Client Side designing and validations using HTML and Java Script.
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.

TECHNICAL SKILLS

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera,Podium.

Big data distribution: Cloudera, Amazon EMR

Programming languages: Core Java, Scala, Python, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

AWS: Amazon s3, lambda,cloudwatch

Databases: Oracle, SQL Server

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTful and SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Cherrypy,Apache Tomcat, WebSphere

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: Git, SVN and CVS

Analytics: Tableau, SPSS, SAS EM and SAS JMP

PROFESSIONAL EXPERIENCE

Confidential, Mouunt laurel,NJ.

Big data Engineer

Responsibilities:

  • Hadoop Processing Framework: Eco system of stateless, configurable and scalable Framework's built on Python3.5 to address enterprise needs of migrating data in and out of Hadoop cluster from multiple source like ASCII/EBCDIC files, Sqoop in/out of Relational Databases, Hive to Hive, Hive to Oracle, Hive to file and vise versa. In addition to Delta framework for change data capture, Date maker to handle multiple calendars, File mover to SCP/TIBCO in and out of Hadoop cluster
  • Customer Insight Engine: Confidential 's marketing campaign Analytics built on Exadata Integrating multiple lines of business using Autosys and TIBCO. Migrating batch processes built on Informatica to Hadoop cluster using Hadoop processing framework
  • Consumer Lending Data Mart: Data mart built on Hadoop cluster using Apache Hive, Talend Open studio, Apache Oozie to Integrate Commercial/Small Business /Personal Loans.
  • Both Ingestion and extracting data tasks were performed.
  • Worked on COBOL copybook for ingestion.
  • Performed data ingestion from various data sources.
  • Worked with various types of daes like SQl, NOSQl and Relational for transferring data to and from HDFS.
  • Used Impala for data processing on top of Hive.
  • Experience in sqooping out data using podium tool and extracting data to down streams
  • Ensure the continuous availability of our mission critical MongoDB clusters.
  • Facilitate meetings with integration partners.
  • Understanding of data storage and retrieval techniques, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
  • Continuously monitored and managed Hadoop cluster using Cloudera Manager.
  • Performed POC’s using latest technologies like spark, Kafka, scala..
  • Cassandra is used for storing data permanently.
  • Created Hive tables, loaded them with data and wrote hive queries.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Executed test scripts to support test driven development and continuous integration.
  • Installed Oozie workflow to run several MapReduce jobs.
  • Scheduling And Deployment - Scheduled jobs using Autosys and Deploy processes to Test for QA effort and after QA signoff deploy jobs to production environment using DEVOPS pipeline (GITHUB, Jenkins, Nexus)
  • Mostly involved in gathering business Requirement and Online System Specification and building up a solution.
  • Create GIT Hub repositories and coordinate work with developers to develop applications, Code review and merge code and release builds
  • Build scripts to Migrate task/processes (YAML files) to deploy in DEV/SIT/PAT/PROD environments cloning from GITHUB

Environment: Cloudera 5.x Hadoop, Linux, IBM DB2, HDFS, Yarn, Impala, Pig, Hive, Sqoop, Spark, Scala, Hbase, MapReduce, Hadoop Datalake, Informatica BDM 10

Confidential, Boston, MA

Hadoop Developer

Responsibilities:

  • Support all business areas of ADAC with critical data analysis that helps team members make profitable decisions as a forecast expert and business analyst and utilize tools for business optimization and analytics.
  • Experience and talents to be a part of ground breaking thinking and visionary goals. As an Executive Analytics, we take the lead to Delivery analyses/ ad-hoc reports including data extraction and summarization using big data tool set.
  • Ensures technology roadmaps are incorporated into data and database designs.
  • Experience in extracting large data sets is a HUGE plus.
  • Extensively used Zookeeper as job scheduler forSpark jobs.
  • Worked on generating reports Neo4J graph database
  • Worked on data processing using SPARK and Python using apache NiFi to copy the data from local file system to HDFS.
  • Developed and Maintained build scripts using Maven on Jenkins to create application buildsfrom source code repository GIT.
  • Assistance to release management for feature releases to production environments.
  • AWS resource provisioning for load testing automation.
  • Classification of key words into categories using a Neo4J graph database and make a recommendation.
  • Experience in data management and analysis technologies like Hadoop, HDFS.
  • Create list and summary view reports.
  • Created Data Pipeline NiFi Cluster
  • Worked on analyzingHadoop cluster and different big data analytic tools including Pig, Hive and Impala
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Data Streaming Architecture with Apache Flink development of real-time streaming Flink applications
  • Created featured develop release branches in GIT for different application to support releasesand CI builds.
  • Handling and communicating with business and understanding the problems from business perspective rather than as a developer perspective.
  • Implementations were done using the spark API's and SparkSQL written in Python.
  • Implementing cost and resource optimized solution considering SQL licenses, EC2

    instance types, evaluating available options and decision making.

  • Experience with configuration of Hadoop Ecosystem components: Hive, Spark, Drill, Impala, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Expertise in using Teradata SQL Assistant, Teradata Manager and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export and exposure to Tpump on UNIXenvironment.
  • Used Hive Query language (HQL) to further analyze the data to identify issues and behavioral patterns.
  • Used Apache Spark and Scala language to find patients with similar symptoms in the past and medications used for them to achieve best results.
  • Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
  • Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
  • Worked on large sets of structured, semi structured and unstructured data
  • Preparing the Unit Test Plan and System Test Plan documents.
  • Preparation & Execution of unit test cases and Troubleshooting and debugging.

Environment: Cloudera Hadoop, Linux, HDFS, Maprduce, Hive, Pig, Sqoop, Oracle, SQL Server, Eclise, Java and Oozie scheduler.

Confidential

Hadoop Developer-Health Care

Responsibilities:

  • Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
  • Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop and Flume.
  • Responsible for loading data from Teradata database into a Hadoop Hive data warehousing layer, and performing data transformations using Hive.
  • Partitioned the collected people's data by disease type and medication prescribed to the patient to improve the query performance.
  • Used Hive Query language (HQL) to further analyze the data to identify issues and behavioral patterns.
  • Used Apache Spark and Scala2.12 to find patients with similar symptoms in the past and medications used for them to achieve best results.
  • Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
  • Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
  • Installed and configured Hortonworks Sandbox as part of POC involving Kafka-Storm-HDFS data flow.
  • Worked on Hortonworks Data Platform for managing nodes in hadoop cluster
  • Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
  • Implemented Zookeeper for job synchronization
  • Deployed NoSQL database HBase to store the outputs of the jobs.
  • Involved in writing test scripts using java and executed it through selenium cucumber .
  • Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
  • Extended the core functionality of Hive language by writing UDF's and UDAF'S
  • Worked on Oozie workflow engine for job scheduling.

Environment: Apache Hadoop, Hortonworks,SQL, Sqoop, Hive, HBase, Pig, Oozie,Flume, Linux, Java 7, Eclipse 3.0, Tomcat 4.1, MySQL.

Confidential

Java Developer

Responsibilities:

  • Understanding Use requirements participating in design discussions, implementation feasibility analysis both at front-end and backend level, documenting requirements.
  • Using RUP and Rational Rose, developed Use Cases, created Class, Sequence and UML diagrams.
  • Application Modeling, developing Class diagrams, Sequence Diagrams, Architecture / Deployment diagrams using IBM Rational Software Modeler and publishing them to web perspective with Java Doc.
  • Participation did in Design Review sessions for development / implementation discussions.
  • Designed & coded Presentation (GUI) JSP’s with Struts tag libraries for Creating Product Service Components (Health Care Codes) using RAD.
  • Developing Test Cases and unit testing using JUnit
  • Coded Action classes, Java Beans, Service layers, Business delegates, to implement business logic with latest features of JDK1.5 such as Annotations and Generics.
  • Extensive use of AJAX and JavaScript for front-end validations, and JavaScript based component development using EXT JS Framework with cross browser support.
  • Appropriate use of Session handling, data Scope levels within the application.
  • Designed and developed DAO layer with Hibernate3.0 standards, to access data from IBM DB2 database through JPA(Java Persistence API) layer creating Object-Relational Mappings and writing PL/SQL procedures and functions
  • Integrating Spring injections for DAOs to achieve Inversion of Control, updating Spring Configurations for managing Java objects using callbacks
  • Application integration with Spring Web Services to fetch data from external Benefits application using SOA architecture, configuring WSDL based on SOAP specifications and marshalling and un-marshalling using JAXB
  • Prepared and executed JUNIT test cases to test the application service layer operations before DAO integration
  • Creating test environments with WAS for local testing using test profile. And interacting with Software Quality Assurance (SQA) end to report and fix defects using Rational Clear Quest.
  • Creating views and updating code to IBM Rational Clear case for Source code control.
  • Developed Microservices with Spring boot and Spring Batch along with the test cases.
  • Solving QA defects, scheduling fixes, providing support to production application.

Environment: Java: JDK 1.5, JSP, JSP Custom Tag libraries, JavaScript, EXT JS, AJAX, XSLT, XML, DOM4J 1.6, EJB, DsHTML, Web Services, SOA, WSDL, SOAP, JAXB, IBM RAD, IBM WebSphere Application server, IBM DB2 8.1, UNIX, UML, IBM Rational Clear case, JMS, Spring Framework, Hibernate 3.0, PL/SQL, JUNIT 3.8, log4j 1.2, Ant 2.7.

We'd love your feedback!