We provide IT Staff Augmentation Services!

Hadoop/bigdata Developer Resume

5.00/5 (Submit Your Rating)

ChicagO

SUMMARY

  • Over 7 years of experience in IT industry, with around 5 years of experience in Hadoop and Hadoop ecosystem, Big Data Analytical, Cloud Data Engineering, Data Warehousing, Data Visualization and Reporting.
  • Experienced in Big Data Ecosystem with Hadoop, HDFS, MapReduce, Pig, Hive, HDFS, HBase, Sqoop, Flume, Kafka, Oozie, NiFi and Spark.
  • Experience in setting up and maintaining Hadoop cluster running HDFS and MapReduce on YARN.
  • Good understanding of MapReduce, which can generate large data sets using a parallel and distributed algorithm. Expertise in writing MapReduce programs to parse and analyse unstructured data.
  • Experienced with distributions including Cloudera, Amazon EMR 4.x and Hortonworks.
  • Handling and further processing schema oriented and non - schema-oriented data. Knowledge on NoSQL databases including HBase, Cassandra.
  • Experience in developing Spark applications using Scala and Python.
  • Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Good knowledge in Hadoop Deamons such as Name Node, Data Node, Job Tracker, Task Tracker, MRVI and YARN architecture.
  • Experienced in working with Hadoop Storage and Analytics framework over AWS cloud using tools like SSH, Putty, and Mind-Term.
  • Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa.
  • Exposure on the HBase distributed database and the ZooKeeper distributed configuration service.
  • Expertise in using Kafka for log aggregation solution with low latency processing and distributed data consumption and widely used Enterprise Integration Patterns.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Familiarity with HANA security, especially User Management, Roles, and Analytic Privileges.
  • Hands on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Elastic Beanstalk (EBS), Route 53, Security Groups, Code Commit, Code Pipeline, Code Build, Code Deploy, Red shift, Cloud Formation, CloudTrail, Cloud Front, CloudWatch, OpsWorks, SNS, Dynamo DB, SES, SQS, Lambda, EMR and other services of the AWS family.
  • Experienced in delivering highly available and fault tolerant applications on Google Cloud Platform using orchestration technology (GCP).
  • Proficiency with Google Cloud Platform (GCP) services including but not limited to compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring, and cloud deployment manager.
  • Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, RabbitMQ, Spark Streaming.
  • Experience in designing both time driven, and data driven automated workflows using Oozie.
  • Strong Database Experience on RDBMS (Oracle, MySQL) with PL/SQL programming skills in creating Packages, Stored Procedures, Functions, Triggers & Cursors.
  • Strong experience in Machine Learning and Data Mining by using R, Python (SciKit-Learn /TensorFlow), and MLlib, ML in Spark. Algorithms including K-Means, Regression, HMM, SVM, Adaboost & Neural Network.
  • Experience in data visualization using Tableau, D3.js, matplotlib and IBM Cognos.
  • Worked in different methodologies such as Waterfall, Spiral, and Agile/SCRUM.
  • Hands on experience in using Google Cloud platform for BigQuery, cloud Dataproc and Apache Airflow Services.
  • Hands on experience working on Apache Airflow cluster, including how to set up, configure, monitor, and troubleshoot an Airflow cluster on Azure. Also implemented Airflow DAG Deployment using CI/CD pipeline.
  • Skilled in monitoring servers using Nagios, Datadog, Cloud watch and using ELK stack Elastic search Logstash.
  • Extensively used Stash, Bit-Bucket and GITHUB for the code control purpose.
  • Extensive Experience in Unit Testing with JUnit, MRUnit, Pytest.
  • Front End experience with HTML4/5, CSS3, JavaScript, AJAX, jQuery, AngularJS, Bootstrap.
  • Extensive experience in Java development skills using Springs, J2SE, Struts, Servlets, Junit, JSP, JDBC, Hibernate, and JPA for object mapping with the database.
  • Strong problem-solving skills, good interpersonal skills and an excellent team player.

TECHNICAL SKILLS

HADOOP ECO SYSTEM: HDFS, MapReduce V1, MapReduce V2, YARN, Hive, Pig, Sqoop, ZooKeeper, Storm, NIFI, Flume, Kafka, RabbitMQ, Spark, Oozie, Avro, MRUnit.

NoSQL DATABASES: MongoDB, Cassandra, HBase

RELATIONAL DATABASES: Oracle 11g/10g/9i/, MySQL 5.0, Microsoft SQL server 9.0, PostgreSQL 8.0

LANGUAGES: Java, Scala, Python, SQL, HiveQL, Pig Latin

DATA ANALYSIS & VIZ: Matlab, Mathematica, Tableau, Matplotlib, D3.js

SCRIPTING: UNIX Shell Scripting, LINUX

CLOUD: AWS EMR, EC2, S3, RDS, GCP, DataProc, DataFlow, Cloud Functions, BigQuery, Azure Databrics, Azure Data Factory.

OS: Linux, Windows, Mac OS

MACHINE LEARNING: Regression, Neural Network, K-Means, HMM, SVM, NLP, Adaboost.

WEB TECHNOLOGIES: HTML, CSS, JavaScript, XML, Ajax, jQuery.

J2EE TECHNOLOGIES: Spring, Servlets, J2SE, JSP, JDBC

PROFESSIONAL EXPERIENCE

Confidential, Chicago

Hadoop/Bigdata Developer

Responsibilities:

  • Involved in all phases of software engineering including requirements analysis, design and the code development and testing.
  • Processed data into HDFS by developing solutions, analysed the data using MapReduce, Hive and Produce summary results from Hadoop to downstream systems.
  • Used Spark to import customer information data from Oracle database into HDFS for data processing along with minor cleansing.
  • Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations using Horton work distribution.
  • Involved in information gathering for new enhancements in Spark Scala, Production support for field issues and label installs for Hive scripts and MapReduce jobs.
  • Developed spark applications in python on distributed environment to load massive number of CSV files with different schema in to Hive tables.
  • Used Maven to build rpms from source code in Scala checked out from GIT repository, with Jenkins being the Continuous Integration Server and Artifactory as repository manager.
  • Responsible for Setting up Linux environments for various applications using shell scripting.
  • Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3.
  • Used AWS glue ETL service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to S3 bucket in parquet format for data analytics purpose.
  • Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs, and objects within each bucket.
  • Created Apache Airflow DAGS using Python. Worked in cloud formation to automate AWS environment creation along with the ability to deploy AWS using bill scripts (Boto3 and AWS CLI) air.
  • Set up scalability for application servers using command line interface for Setting up and administering DNS system in AWS using Route53.
  • Write Python scripts to update content in the database and manipulate files. Involved in building database Model, APIs, and Views utilizing Python technologies to build applications.
  • Visualize and manipulate the data using various machine learning libraries like NumPy, SciPy and Pandas in Python scripts for the perfect analysis of data.
  • Translated customer business requirements into technical design documents, established specific solutions, and leading the efforts including programming in Spark Scala and testing that culminate in client acceptance of the results.
  • Expertise in Object-Oriented Design (OOD) and end-to-end software development experience working on Scala coding and implementing mathematical models in Spark Analytics.
  • Created Hive external tables on top of datasets loaded in AWS S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
  • Used AWS Lambda to perform data validation, filtering, sorting, and other transformations for every data change in a HBase table and load the transformed data to RDS.
  • Loading data from different servers to S3 bucket and setting appropriate bucket permissions.
  • Configured routing to send JMS files to interact with application for real time data using Kafka.
  • Managed Zookeeper for cluster co-ordination and Kafka Offset monitoring.
  • Optimized legacy queries to extract the customer information from Oracle.
  • Reviewed HDFS usage and system design for future scalability and fault tolerance.
  • Strong Experience in implementing Data warehouse solutions in Confidential Redshift.
  • Worked on various projects to migrate data from on premise databases to Confidential Redshift and RDS.
  • Used Tableau for generating reports on weekly basis to the customer.
  • Worked on Informatica Data Quality for providing an extensive array of cleansing and standardization
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Worked on CI/CD pipeline for code deployment by engaging different tools (GIT, Jenkins, Code Pipeline) in the process right from developer code check in to production development.

Confidential, Atlanta

Big Data developer

Responsibilities:

  • Worked as a Hadoop developer with Hadoop Ecosystems components especially HBase, Sqoop, Zookeeper, Oozie, Hive.
  • Implemented the data management Framework for building Data Lake.
  • Involved in Agile development methodology active member in scrum meetings.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Designed Data Lake storage solution for Data science Project using Azure Data factory Pipelines.
  • Developed Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analysing & transforming the data to uncover insights.
  • Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas.
  • Experience in fact dimensional modelling (Star schema, Snowflake schema), transactional modelling and SCD (Slowly changing dimension)
  • Managed and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
  • Implemented Hadoop, Map Reduce, HDFS, Azure to develop multiple MapReduce jobs in Hive for data cleansing and pre-processing.
  • Developed Spark code to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Created Hive tables to load large data sets of structured data coming from WADL after transformation of raw data.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Developed a Spark job in Java which indexes data into Elastic Search from external Hive tables which are in HDFS.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used KafkaUtils module in PySpark to create an input stream that directly pulls the messages from Kafka broker.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Used windows Azure SQL reporting services to create reports with tables, charts, and maps.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Extensively worked on creating an End-to-End ETL pipeline orchestration using NIFI.
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
  • Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Confidential, Peoria

Hadoop developer

Responsibilities:

  • Developed ETL jobs across multiple platforms using Spark Scala, Hadoop, and Vertica.
  • Configured Hive Meta-store with Vertica database and vice versa using SQOOP.
  • Designed Data flow to pull the data from Rest API using Apache NIFI with SSL context configuration enabled.
  • Created a POC for the demonstration of retrieving the JSON data by calling Rest service and converting it into CSV by creating data flow and loading it into Vertica by calling a Unix script in NIFI.
  • Developed custom processors in Java using maven to add functionality in Kafka for additional tasks.
  • Wrote complex SQL queries, and PL/SQL stored procedures and convert them to ETL tasks in Spark.
  • Created and maintained documents related to business processes, mapping design, data profiles, and tools.
  • Extracted weblogs by using Spark Streaming job, which is written in JavaScript and continuously tracked using Oozie.
  • Wrote PySpark jobs with RDDs, Pair RDDs, Transformations and actions, and data frames for data transformations from relational sets.
  • Automated the development environment using Vagrant and Shell provisioning.
  • Used Aqua-data studio to collaborate with the Vertica database for performance tuning and visual analysis.
  • Created Dashboards and sets of data using Tableau for business decision purposes and estimating the sales on location bases.
  • Developed UDFs to extract and trim the raw data using Spark Scala.
  • Responsible for developing Spark coding for extracting data using JSON Reader function.
  • Connected Tableau server to publish dashboards to a central position for portal integration.
  • Created visual trends and calculations in Tableau on customers and product data as per client requirements.
  • Designed Splunk Dashboards for monitoring the pipeline jobs in production.
  • Created Alerts using Splunk for failed and late-running jobs.

Confidential

Data Analyst

Responsibilities:

  • Worked in the IT service with Data Management team for different customer teams (enrolment, student help and HR, payroll etc.) to secure the data and create different reports based on the requirement.
  • Worked on building the cubes and reports on Cognos.
  • Created the business metric KPI to evaluate factors for different modules.
  • Created the catalogue in data manager, fact build, dimension build, reference dimension, customize create and deploy ETL jobs from the transaction data.
  • Worked on database migration, upgrade and maintenance.
  • Cleansed, mapped and transformed data, created the job stream, add and delete the components to the job stream on data manager based on the requirement.
  • Created the look-up tables for the data processing.
  • Used statistical R-packages and R-programming for Factor quantitative Analysis and k-means clustering.
  • Big data Architectural analysis and Hadoop eco system migration with vm.

Confidential

Java Developer

Responsibilities:

  • Worked on both WebLogic Portal 9.2 for portal development and WebLogic 8.1 for Data services programming.
  • Developed the presentation layer using JSP, HTML, CSS, and client validations using JavaScript.
  • Used GWT to send Ajax requests to the server and update data in UI dynamically.
  • Developed Hibernate 3.0 in Data Access Layer to access and update information in the database.
  • Used JDBC, SQL, and PL/SQL programming for storing, retrieving, and manipulating the data.
  • Involved in designing and developing the eCommerce site using JSP, Servlets, EJBs, JavaScript, and JDBC.
  • Used Eclipse 6.0 as IDE for application development and configured Struts framework to implement MVC design patterns.
  • Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
  • Designed and developed GUI using JSP, HTML, and CSS. Worked with JMS for messaging interface.
  • Used XML for ORM mapping relations with the java classes and the database.
  • Used Subversion as the version control system. Extensively used Log4j for logging the log files.

We'd love your feedback!