We provide IT Staff Augmentation Services!

Big Data Engineer/application Architect Resume

2.00/5 (Submit Your Rating)

New York, NY

PROFESSIONAL SUMMARY:

  • Over 9 years of working experience as a Big Data developer and designed and developed various applications on Big Data and python open - source technologies.
  • Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Leveraged strong skills in developing applications involving Big Data technologies like Hadoop, Map Reduce, Yarn, Flume, Hive, Pig, Sqoop, H Base, Cloudera, Map R, Avro, Spark and Scala.
  • Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG, and MapReduce.
  • Develop various scripts, numerous batch jobs to schedule various Hadoop programs.
  • Experience in analyzing data using Hive QL, and custom MapReduce programs in Java.
  • Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
  • Good knowledge of NoSQL databases like Mongo DB, Cassandra and HBase.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Splunk.
  • Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, Angular with MVC architecture.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data Analytics.
  • Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
  • Expertise in developing a simple web based application using J2EE technologies like JSP, Servlets, and JDBC.
  • Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
  • Work Extensively in Core Java, Struts, JSF, Spring, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
  • Extensively worked on Linux based CentOS and strong hands-on experience on Linux commands.
  • Well versed working with Relational Database Management Systems as Oracle, MS SQL, MySQL Server
  • Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics
  • Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
  • Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
  • Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper, Databricks

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake, Data Factory

Hadoop Distributions: Cloudera, Hortonworks, MapR

Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets

Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven, Visual Basic Studio

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS, Codecommit

ADDITIONAL SKILLS:

J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript, Hadoop, Hive, MongoDB, Zookeeper, Spark, MapR, Pig, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle, PL/SQL, Nifi, XML, MYSQL

WORK EXPERIENCE:

Confidential, New York, NY

Big Data Engineer/Application Architect

Responsibilities:

  • Working on Hadoop eco-system over AWS cloud leveraging services like EMR, EC2, S3, CloudFormation, Lambda, Athena, Glue, DynamoDB and AWS cost-explorer etc..
  • Responsible for developing and managing the Analytical/Machine learning capabilities on AWS cloud across Amex.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce frameworks, Hive, Spark RDD
  • Involved in designing Hadoop architecture on AWS leveraging Service Catalog, CloudFormation, AWS EMR, DynamoDB and event processing using lambda functions.
  • Developed Data-governance tools using python and spark for securely placing enterprise data on AWS S3.
  • Designed and configured the Hadoop cluster using AWS EMR based on user behavior.
  • Responsible for Design EDW Application Solutions & Deployment, optimizing processes, definition and implementation of best practices.
  • Built an end-to-end automated tool which performs extracting zip files, load the data into respective hive tables in compressed format using shellscript, pyspark RDD and run QC.
  • Responsible for providing support to users across Amex on their data processing and modeling pipelines.
  • Provided several performance tuning and query optimization techniques to users for their hive and spark jobs.
  • Worked closely with Business users to gather requirements and troubleshooting issues on machine learning algorithms.
  • Performed data modeling using gradient boosting, Tree building algorithms such as AXGboost, GBDT, catboost etc...
  • Worked closely with Business vendors for enhancing BigData and machine learning platforms on AWS cloud as per business needs.
  • Performed several POCs on newly on-boarded AWS and BigData related services which help in enhancing the platform.
  • Managed and lead the development effort with the help of a diverse internal and overseas group.
  • Developed UI application using Angular and NVD3 to display network graphs of all the interlinked customers.
  • Participated in scrum and retrospective meetings and worked closely with scrum master to create features and stories in Jira.
  • Extensively worked on Excel for generating pivot tables and performing vlookup to join records from multiple Excel sheets.

Environment: AWS, EMR, EC2, S3, RDS, Glue, Athena, Service Catalog, Cloud Formation, Lambda Functions, Hadoop, Spark, Hive, Python, Pandas, XGBoost, Tensorflow, Angularjs, NVD3, Linux, HDFS, Spark-streaming

Confidential, San Antonio, TX

Sr. Big Data Developer

Responsibilities:

  • As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, HBase, Zookeeper, Spark Streaming with CDH distribution.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in Agile methodologies, daily scrum meetings, sprint planning.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on MongoDB, HBase databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.3, CDH4, MongoDB, Python, pandas, Zookeeper, Spark, MapR, Pig 0.17, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle 12c, PL/SQL, Nifi, XML, JSON, MYSQL, Java

Confidential, Sunnyvale, CA

Big Data/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Developed Apache Spark applications by using spark for data processing from various streaming sources.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Migrated MapReduce jobs to Spark jobs to achieve better performance.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame.
  • Worked on Kafka and REST API to collect and load the data on Hadoop file system also used Sqoop to load the data from relational databases.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Involved in transforming data from legacy tables to HDFS and Hive tables using Sqoop.
  • Expertise in implementing Spark using and Spark SQL for faster testing and processing of data responsible to manage data from different sources Scala.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Involved in migrating MapReduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed the batch scripts to fetch the data from ECS cloud and do required transformations in Scala using Spark framework.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
  • Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.

Environment: Hadoop 3.0, Spark, Python, Hive 2.3, Agile, MapReduce, Kafka, HBase, HDFS, Sqoop, Scala, RDBMS, Oozie, Pig 0.17, Sqoop, Cassandra 3.11, NoSQL, Elastic Search, Java

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed and utilized J2EE Services and JMS components for messaging communication in WebSphere Application Server.
  • Implemented MVC architecture by separating the business logic from the presentation layer.
  • Developed code using Java, J2EE, and spring also used Hibernate as an ORM tool for object relational mapping.
  • Used JNDI to perform lookup services for the various components of the system.
  • Created REST web services to send data in JSON format to different systems using spring boot.
  • Extensively used JQuery to provide dynamic User Interface and for the client side validations.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Participated in object-oriented design, development and testing of REST APIs using Java.
  • Implemented Dependency Injection (IOC) feature of spring framework to inject dependency into objects.
  • Developed data access layer by integrating spring and Hibernate.
  • Used Hibernate framework for data persistence. Developed Hibernate objects for persisting data into the database.
  • Responsible for developing Hibernate configuration and mapping files for Persistent layer (Object and Relational Mapping).
  • Developed Object Oriented JavaScript code and responsible for client-side validations using JQuery
  • Extensively used Spring IOC features with spring framework for bean injection and transaction management.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Involved in designing the application using MVC pattern
  • Created JDBC data source and connection pooling for the Application and hibernate mapping files when needed.
  • Consumed Restful Web Services to establish communication between different applications
  • Implemented Business Services using the Core java and spring.
  • Wrote object-oriented JavaScript for transparent presentation of both client- and server-side validation.

Environment: J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript

We'd love your feedback!