We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

2.00/5 (Submit Your Rating)

Rosemont, IL

SUMMARY:

  • 6+ Years of experience as Hadoop/Spark/Big Data developer using Big data technologies like Hadoop and Spark Ecosystems.
  • Experience in Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie and Zookeeper.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Good Knowledge in writing Spark Applications in Scala and Java.
  • Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experienced working with Spark Streaming, SparkSQL and Kafka for real - time data processing.
  • Hands on experience with Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Used Spark Data Frame Operations to perform required Validations in the data.
  • Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions. Good knowledge on Spark architecture and real-time streaming using Spark.
  • Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive to Relational Database Systems and vice - versa.
  • Extensive experience in building ETL jobs using Jupyter notebooks with Spark and Python.
  • Experienced and well versed in writing and using UDFs in both Hive and PIG using Java.
  • Good knowledge in querying data from Cassandra for searching grouping and sorting.
  • Heavily used Jupyter Notebooks to analyze and connect the data from multiple sources.
  • Very good understanding in the various IAM modules such as Identity Management, Identity Governance, Access Management and Life Cycle Management.
  • Working Knowledge on creating PIM reports.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Worked with different file formats like CSV, Text files, Sequence files, XML, JSON, Avro files.
  • Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Working experience on Hortonworks distribution and Cloudera Hadoop distribution versions CDH4 and CDH5 for executing the respective scripts.
  • Working as Cloud Administrator on Microsoft Azure, involved in configuring virtual machines, storage accounts, resource groups.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
  • Worked with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN).
  • Expertise in implementing Ad-hoc queries using Hive QL and good knowledge in creating Hive tables and loading and analyzing data using hive queries.
  • , interpersonal, analytical skills, and strong ability to perform as part of team.
  • Experience with Agile Development process tools like Jira.

TECHNICAL SKILLS:

Numpy, Pandas, scikit: learn, deep learning, pytorch, tensorflow.

Hadoop Distribution: Cloudera (CDH 3, CDH4, CDH5), Horton Works

Languages: Java, Scala, Python

Data stores: MySQL, SQL Server

Big data: MapReduce, HDFS, Flume, Hive, Pig, Oozie, HBase, Sqoop, Spark, NiFi and Kafka

Cloud: Azure, AWS.

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

ETL: Talend and Informatica

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, AngularJs, and JSON

Development/Build tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

No SQL Database: Cassandra, Mongo DB, H Base

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC

WORK EXPERIENCE:

Confidential - Rosemont, IL

Sr. Big Data Developer

  • Developed Spark Applications by using Scala, Java, Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Implemented Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table for more efficient data.
  • Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them into S3 Bucket.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Played a key role in design, deployment and testing of IBM Security IAM suite providing efficient user management through an innovative, enterprise-wide automated provisioning system.
  • Create Solution Architecture based upon Microsoft Azure PaaS Services.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
  • Running analytics on power plant data using Pyspark API with Jupyter notebooks in on premise cluster for certain transforming needs.
  • Used Text Mining and NLP techniques find the sentiment about the organization.
  • Deployed a spam detection model and performed sentiment analysis of customer product reviews using NLP techniques.
  • Developed and implemented predictive models of user behavior data on websites, URL categorical, social network analysis, social mining and search content based on large-scale Machine Learning.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, Validation, transformation according to the requirement.
  • Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets­­­
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Developed Oozie workflow engine to run multiple Hive, Pig, Sqoop and Spark jobs.
  • Real time streaming the data using Spark with Kafka.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data
  • Created several jobs in Talend ETL tool to perform transformation on source files.
  • Used Spark transformations for Data Wrangling and ingesting the real-time data of various file formats.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, Azure, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential, Pleasant Prairie, WI

Sr. Big Data Developer

  • Worked with Hortonworks distribution of Hadoop for setting up the cluster and monitored it using Ambari.
  • Created ODBC connection through Sqoop between Hortonworks and SQL Server.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Extensively worked on Hive, Pig, Map Reduce, Sqoop, Oozie in an optimized way of distributed processing.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS.
  • Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Led the successful customization of ISIM based IAM solution for the client's specific requirements to integrate and support exchange 2013 and also worked on the upgrade of IAM technology stack from ITIM 5.1 to ISIM 6.0
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Successfully migrated the data from AWS S3 source to the HDFS sink using Flume.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Used Pig as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS.
  • Wrote Hive and Pig scripts for joining the raw data with the lookup data and for some aggregative operations as per the business requirement.
  • Created a Hadoop Cluster on 4 nodes in the cloud using Cloudera Manager and have set up Zeppelin and ipython notebook to use spark interactively.
  • Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Reading the log files using Elastic search Logstash and alerting users on the issue and also saving the alert details to MongoDB for analyzations.
  • Worked on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Wrote queries in MongoDB to generate reports to display in the dash board.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
  • Used Sqoop to import data from RDBMS to HDFS cluster using custom scripts.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR) and setting up environments on Amazon AWS EC2 instances.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Developed Tableau workbooks from multiple data sources using Data Blending.
  • Developed Tableau visualization and dashboards using Tableau Desktop.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, Flume, Sqoop, AWS, Tableau, Core Java, Spark, Scala, MongoDB, Horton Works, Elastic Search 5.x, Eclipse.

Confidential, Boston - MA

Java Developer

  • Responsible for a setup of 5 node development cluster for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.
  • Responsible for Installation and configuration of Hive, Sqoop, Zookeeper, Knox and Oozie on the Hortonworks Hadoop cluster using Ambari.
  • Involved in extracting large sets of structured, semi structured and unstructured data.
  • Developed Sqoop scripts to import data from Oracle database and handled incremental loading on the point of sale tables.
  • Created Hive external tables and views, on the data imported into the HDFS.
  • Developed and implemented Hive scripts for transformations such as evaluation, filtering and aggregation.
  • Worked on partitioning Hive tables and running the scripts in parallel to reduce run-time of the scripts.
  • Developed User Defined Functions (UDF) in Java if required for hive queries.
  • Worked with data in multiple file formats including Avro, Parquet, Sequence files, ORC and Text/ CSV.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on creating End-End data pipeline orchestration using Oozie.
  • Developed bash scripts to automate the above process of Extraction, Transformation and Loading.
  • Implemented Authentication and Authorization using Kerberos, Knox and Apache Ranger.
  • Very good experience in managing the Hadoop cluster using Ambari.
  • Created roles and user groups in Ambari for permitted access to Ambari functions.
  • Working knowledge of Mapreduce and YARN architectures.
  • Working knowledge on Zoo Keeper.
  • Working knowledge on Tableau.

Confidential

Java Developer

  • Participated in major phases of software development cycle with requirement gathering, Unit testing, development, and analysis and design phases using Agile/SCRUM methodologies.
  • Interaction with onsite team for change requests and understanding the requirement changes for implementation of the same.
  • Implemented the application with Spring Framework for implementing Dependency Injection and provide abstraction between presentation layer and persistence layer.
  • Developed multiple batch jobs using Spring Batch to import files of different formats like XML, CVS etc.
  • Designed and developed the database for the application in DB2. Integrated Spring framework with JPA that is used for Database operations
  • Developed SQL and JPQL queries, triggers, views to interact with Database.
  • Developed JUnit test cases for Unit Testing and functional testing for modules and prepared Code Documentation for future reference and upgrades.
  • Used Log4j for logging for warnings, errors etc. Involved in Defect fixing and maintenance.
  • Used maven 3.0 and Spring 3.0 and eclipse to develop the coding for batch jobs.
  • Involved in analyzing and fixing the root cause of technical issues and defects during development.
  • Involved in total backend development and deployment of the application. Deployed Application using Web Sphere IBM.
  • Maintain high-quality of RESTful services guided by best practices found in the Richardson Maturity Model. Designed REST web services supporting both XML and JSON for handling AJAX requests.

Environment: JAVA/J2EE, JPA, Spring 3.0, Hibernate, JSON, XML, Maven, Unix/Linux, JUnit 4, MySQL 5.1, Eclipse, RTC, Log4j, Web Sphere IBM, AJAX.

We'd love your feedback!