We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

3.00/5 (Submit Your Rating)

Deerfield, IL

SUMMARY

  • Over 7 years of professional IT experience in different phases of Software Development Life Cycle with Development experience in Big data technologies & Java/J2EE
  • More than 5years of work experience in ingestion, storage, querying, processing and analysis of Big Data which includes experience in Bigdata technologies such as Spark, Kafka, Azure, Sqoop, Hive, Pig, AWS, Impala, Tableau, Talend, Autosys and other NoSQL databases
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase and Cassandra
  • Worked on multiple Hadoop distributions including Cloudera, Hortonworks, MapR and Apache distributions
  • Worked on installation, configuring, supporting and managing Hadoop Clusters using Cloudera (CDH 5.X), MapR, Hortonworks distributions and on Amazon web services (AWS)
  • Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RDS, RedShift which provides fast and efficient processing of Big Data
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia
  • Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s
  • Good understanding of R Programming, Data Mining and Machine Learning techniques
  • Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement
  • Worked on docker based containerized applications
  • Debugging MapReduce jobs using Counters and MRUNIT testing
  • Possess experience working on Teradata, Oracle, Netezza, SQLServer and MySQL database
  • Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant - Build Tool, MS-Office, PLSQL Developer, SQL*Plus
  • Worked in Agile environment with active scrum participation

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Kafka, Sqoop, Hive, MapReduce, Pig, Hive, Flume, Impala, Oozie, ZooKeeper, ELK, Solr, Storm, Ranger, Drill, Knox, Azure, Ambari

Hadoop Distributions: Cloudera, MapR and Hortonworks

Languages: Java, Scala, Python, SQL, JavaScript

No SQL Databases: Cassandra, MongoDB and HBase

Development / Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

RDBMS: Teradata, Oracle, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools: Talend, Informatica

PROFESSIONAL EXPERIENCE

Confidential, Deerfield, IL

Sr. Big data Developer

Responsibilities:

  • Responsible for creating, managing and monitoring the test & production elastic search clusters
  • Work with IT Service Management and other groups and make sure that all events, incidents and problems are resolved as per the SLA
  • Implemented Spark using Scala, utilizing Data frames and Spark SQL API for faster processing of data
  • Developed multiple Spark jobs in PySpark for data cleaning and preprocessing
  • Developed Hive UDF’s to handle data quality and create filtered datasets for further processing
  • Designed, deployed and maintained the implementation of Cloud solutions using MS Azure and underlying technologies
  • Created Azure data pipelines using PySpark and configured Azure Active Directory
  • Responsible for collecting data from multiple sources and pushing to Elastic search
  • Design and implement data modeling for Elastic search indexing and query to handle millions of documents
  • Developed Logstash scripts to import oracle and HDFS data to elastic search
  • Written MapReduce/Pig programs for ETL and developed Customized UDF’s in java
  • Responsible for collecting and loading logs into Elastic search using logstash, filebeat and metricbeat
  • Scheduled Sqoop jobs using ESP scheduler
  • Used Jira for task tracking and Bitbucket for version control

Environment: Azure Databricks, Kubernetes, SparkSQL, Elastic search, Logstash, Kibana, Filebeat, Metricbeat, Hive, Sqoop, HBase, PySpark, Shell Scripting, Java, Junit, Oracle

Confidential, Greenwood village, CO

Big data Developer

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture
  • Developed Spark code by using Scala shell commands as per the requirement
  • Worked on importing and exporting data, into & out of HDFS and Hive using Sqoop
  • Worked on creating Hive tables and wrote Hive queries for data analysis to meet business requirements
  • Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency
  • Planning Cassandra cluster which includes Data sizing estimation and identifying hardware requirements based on the estimated data size and transaction volume
  • Bulk loading of the data into Cassandra cluster using Java API's
  • Developed real time data processing applications by using Scala and Python and implemented Spark Streaming from various streaming sources like Kafka and JMS
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions
  • Experience in transferring data from different data sources into HDFS systems using message broker, Kafka producers, consumers and Kafka brokers
  • Worked on installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement
  • Designing data models in Cassandra and working with Cassandra Query Language
  • Used the Spark - Cassandra Connector to load data to and from Cassandra
  • Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity
  • Worked with BI teams in generating the reports on Tableau
  • Worked closely with the application team to resolve issues related to spark and cql
  • Used Jira for task/bug tracking
  • Used GIT for version control

Environment: spark Databricks, SparkSQL, Scala, Kafka, Cassandra, Spark streaming, AWS, Shell Scripting, Java, Oracle, Hadoop

Confidential, Franklin, TN

Spark/Hadoop Developer

Responsibilities:

  • Extensively used Spark stack to develop preprocessing jobs which includes RDD, Datasets and Dataframes Api's to transform the data for upstream consumption
  • Worked on extracting and enriching HBase data between multiple tables using joins in spark
  • Worked on writing APIs to load the processed data to HBase tables
  • Replaced the existing MapReduce programs into Spark application using Scala
  • Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service
  • Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS
  • Good knowledge on Kafka streams API for data transformation
  • Implemented logging framework - ELK stack (Elastic Search, LogStash& Kibana) on AWS
  • Setup Spark on EMR to process huge data which is stored in Amazon S3
  • Used Talend tool to create workflows for processing data from multiple source systems
  • Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams
  • Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns
  • Involved in writing optimized Pig Scripts along with developing and testing Pig Latin Scripts
  • Support the data team in optimizing the design, including partitioning, of S3 buckets, EMR Hive and Redshift tables
  • Automated the complex workflows using the Apache Airflow workflow handler
  • Develop, test and deploy workflows to aggregate and move data from EMR to Redshift
  • Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing
  • Deployed applications using Jenkins framework integrating Git- version control with it
  • Participated in production support on a regular basis to support the Analytics platform
  • Used Rally for task/bug tracking
  • Used GIT for version control

Environment: MapR, HBase, Scala, Sqoop, AWS, Airflow, RedShift, Hive, Drill, SparkSql, Spark streaming, Kafka, Docker, Spark, Unix, Neo 4j, Talend, Shell Scripting, Java.

Confidential, Piscataway, NJ

Spark/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop
  • Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data
  • Developed Spark jobs and Hive Jobs to summarize and transform data
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement
  • Experienced in developing Spark code for data analysis in both python and scala
  • Built on-premise data pipelines using kafka and spark for real time data analysis
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors
  • Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
  • Analyzed the SQL scripts and designed the solution to implement using Scala
  • Implemented Hive complex UDF's to execute business logic with Hive Queries
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc
  • Worked on solr configuration and customizations based on requirements
  • Worked on creating and managing shards and indexes in Solr
  • Worked on improving performance and tune Solr cluster for query response times and index size
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Responsible for developing data pipeline by implementing Kafka producers and consumers
  • Performed data analysis with HBase using Apache Pheonix
  • Exported the analyzed data to Impala to generate reports for the BI team
  • Managing and reviewing Hadoop Log files to resolve any configuration issues
  • Developed a program to extract the name entities from OCR files
  • Used Gradle for building and testing project
  • Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects
  • Used Mingle and later moved to JIRA for task/bug tracking
  • Used GIT for version control

Environment: Cloudera, Hadoop, AWS, Kafka, Spark, Scala, NiFi, HBase, Hive, Impala, Drill, SparkSql, MapReduce, Sqoop, Oozie, Storm, Zepplin, Mesos, AirFlow, Docker, Redshift, PySpark, Solr, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Converting the existing relational database model to Hadoop ecosystem
  • Installed & maintained cloudera Hadoop distribution
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, HBase & Sqoop
  • Involved in loading the data from Linux file system to HDFS
  • Exported the analyzed data to the relational databases using sqoop for virtualization and to generate reports for the BI team
  • Used Junit for server side testing
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Supported in setting up QA environment and updating configurations for implementing scripts with pig and sqoop
  • Involved in defining job flows, managing and reviewing log files
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Implemented Map Reduce programs on log data to transform into structured way to find user information
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data
  • Developed the MapReduce programs to parse the raw data and store the pre aggregated data in the partitioned tables
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management
  • Developed pages using JSP, JSTL, Spring tags, JQuery, Java Script & Used JQuery to make AJAX calls
  • Collected the log data from web servers and integrated into HDFS using Flume
  • Responsible to manage data coming from different sources
  • Extracted files from HBase and placed into HDFS using Sqoop and pre-process the data for analysis.
  • Gained experience with NoSQL database.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
  • Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in several production systems
  • Normalized Oracle database, conforming to design concepts and best practices
  • Developed proto-type test screens in HTML and JavaScript
  • Developed the application by using the Spring MVC framework
  • Spring IOC being used to inject the parameter values for the Dynamic parameters
  • Developed JUnit testing framework for Unit level testing
  • Actively involved in code review and bug fixing for improving the performance
  • Documented application for its functionality and its enhanced features
  • Created connection through JDBC and used JDBC statements to call stored procedures

Environment: MapReduce, HBase, Sqoop, Hadoop, JavaScript, HDFS, Pig, Hive, Python, Oozie, Flume, Toad, Putty, Struts, JSP, Spring, Servlets, WebSphere, HTML, XML, UNIX Shell Scripting, Linux, SQL

We'd love your feedback!