Sr. Big data Developer Resume Deerfield, IL - Hire IT People

SUMMARY

Over 7 years of professional IT experience in different phases of Software Development Life Cycle with Development experience in Big data technologies & Java/J2EE
More than 5years of work experience in ingestion, storage, querying, processing and analysis of Big Data which includes experience in Bigdata technologies such as Spark, Kafka, Azure, Sqoop, Hive, Pig, AWS, Impala, Tableau, Talend, Autosys and other NoSQL databases
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures
Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase and Cassandra
Worked on multiple Hadoop distributions including Cloudera, Hortonworks, MapR and Apache distributions
Worked on installation, configuring, supporting and managing Hadoop Clusters using Cloudera (CDH 5.X), MapR, Hortonworks distributions and on Amazon web services (AWS)
Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RDS, RedShift which provides fast and efficient processing of Big Data
Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia
Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s
Good understanding of R Programming, Data Mining and Machine Learning techniques
Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement
Worked on docker based containerized applications
Debugging MapReduce jobs using Counters and MRUNIT testing
Possess experience working on Teradata, Oracle, Netezza, SQLServer and MySQL database
Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho
Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews
Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant - Build Tool, MS-Office, PLSQL Developer, SQL*Plus
Worked in Agile environment with active scrum participation

TECHNICAL SKILLS

Big Data Ecosystem: Spark, Kafka, Sqoop, Hive, MapReduce, Pig, Hive, Flume, Impala, Oozie, ZooKeeper, ELK, Solr, Storm, Ranger, Drill, Knox, Azure, Ambari

Hadoop Distributions: Cloudera, MapR and Hortonworks

Languages: Java, Scala, Python, SQL, JavaScript

No SQL Databases: Cassandra, MongoDB and HBase

Development / Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

RDBMS: Teradata, Oracle, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools: Talend, Informatica

PROFESSIONAL EXPERIENCE

Confidential, Deerfield, IL

Sr. Big data Developer

Responsibilities:

Responsible for creating, managing and monitoring the test & production elastic search clusters
Work with IT Service Management and other groups and make sure that all events, incidents and problems are resolved as per the SLA
Implemented Spark using Scala, utilizing Data frames and Spark SQL API for faster processing of data
Developed multiple Spark jobs in PySpark for data cleaning and preprocessing
Developed Hive UDF’s to handle data quality and create filtered datasets for further processing
Designed, deployed and maintained the implementation of Cloud solutions using MS Azure and underlying technologies
Created Azure data pipelines using PySpark and configured Azure Active Directory
Responsible for collecting data from multiple sources and pushing to Elastic search
Design and implement data modeling for Elastic search indexing and query to handle millions of documents
Developed Logstash scripts to import oracle and HDFS data to elastic search
Written MapReduce/Pig programs for ETL and developed Customized UDF’s in java
Responsible for collecting and loading logs into Elastic search using logstash, filebeat and metricbeat
Scheduled Sqoop jobs using ESP scheduler
Used Jira for task tracking and Bitbucket for version control

Environment: Azure Databricks, Kubernetes, SparkSQL, Elastic search, Logstash, Kibana, Filebeat, Metricbeat, Hive, Sqoop, HBase, PySpark, Shell Scripting, Java, Junit, Oracle

Confidential, Greenwood village, CO

Big data Developer

Responsibilities:

Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture
Developed Spark code by using Scala shell commands as per the requirement
Worked on importing and exporting data, into & out of HDFS and Hive using Sqoop
Worked on creating Hive tables and wrote Hive queries for data analysis to meet business requirements
Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency
Planning Cassandra cluster which includes Data sizing estimation and identifying hardware requirements based on the estimated data size and transaction volume
Bulk loading of the data into Cassandra cluster using Java API's
Developed real time data processing applications by using Scala and Python and implemented Spark Streaming from various streaming sources like Kafka and JMS
Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions
Experience in transferring data from different data sources into HDFS systems using message broker, Kafka producers, consumers and Kafka brokers
Worked on installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement
Designing data models in Cassandra and working with Cassandra Query Language
Used the Spark - Cassandra Connector to load data to and from Cassandra
Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity
Worked with BI teams in generating the reports on Tableau
Worked closely with the application team to resolve issues related to spark and cql
Used Jira for task/bug tracking
Used GIT for version control

Environment: spark Databricks, SparkSQL, Scala, Kafka, Cassandra, Spark streaming, AWS, Shell Scripting, Java, Oracle, Hadoop

Confidential, Franklin, TN

Spark/Hadoop Developer

Responsibilities:

Extensively used Spark stack to develop preprocessing jobs which includes RDD, Datasets and Dataframes Api's to transform the data for upstream consumption
Worked on extracting and enriching HBase data between multiple tables using joins in spark
Worked on writing APIs to load the processed data to HBase tables
Replaced the existing MapReduce programs into Spark application using Scala
Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service
Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS
Good knowledge on Kafka streams API for data transformation
Implemented logging framework - ELK stack (Elastic Search, LogStash& Kibana) on AWS
Setup Spark on EMR to process huge data which is stored in Amazon S3
Used Talend tool to create workflows for processing data from multiple source systems
Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams
Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources
Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns
Involved in writing optimized Pig Scripts along with developing and testing Pig Latin Scripts
Support the data team in optimizing the design, including partitioning, of S3 buckets, EMR Hive and Redshift tables
Automated the complex workflows using the Apache Airflow workflow handler
Develop, test and deploy workflows to aggregate and move data from EMR to Redshift
Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing
Deployed applications using Jenkins framework integrating Git- version control with it
Participated in production support on a regular basis to support the Analytics platform
Used Rally for task/bug tracking
Used GIT for version control

Environment: MapR, HBase, Scala, Sqoop, AWS, Airflow, RedShift, Hive, Drill, SparkSql, Spark streaming, Kafka, Docker, Spark, Unix, Neo 4j, Talend, Shell Scripting, Java.

Confidential, Piscataway, NJ

Spark/Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop
Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data
Developed Spark jobs and Hive Jobs to summarize and transform data
Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement
Experienced in developing Spark code for data analysis in both python and scala
Built on-premise data pipelines using kafka and spark for real time data analysis
Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors
Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
Analyzed the SQL scripts and designed the solution to implement using Scala
Implemented Hive complex UDF's to execute business logic with Hive Queries
Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them
Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc
Worked on solr configuration and customizations based on requirements
Worked on creating and managing shards and indexes in Solr
Worked on improving performance and tune Solr cluster for query response times and index size
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS
Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Responsible for developing data pipeline by implementing Kafka producers and consumers
Performed data analysis with HBase using Apache Pheonix
Exported the analyzed data to Impala to generate reports for the BI team
Managing and reviewing Hadoop Log files to resolve any configuration issues
Developed a program to extract the name entities from OCR files
Used Gradle for building and testing project
Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects
Used Mingle and later moved to JIRA for task/bug tracking
Used GIT for version control

Environment: Cloudera, Hadoop, AWS, Kafka, Spark, Scala, NiFi, HBase, Hive, Impala, Drill, SparkSql, MapReduce, Sqoop, Oozie, Storm, Zepplin, Mesos, AirFlow, Docker, Redshift, PySpark, Solr, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.

Confidential

Java/Hadoop Developer

Responsibilities:

Converting the existing relational database model to Hadoop ecosystem
Installed & maintained cloudera Hadoop distribution
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, HBase & Sqoop
Involved in loading the data from Linux file system to HDFS
Exported the analyzed data to the relational databases using sqoop for virtualization and to generate reports for the BI team
Used Junit for server side testing
Involved in running Hadoop jobs for processing millions of records of text data.
Supported in setting up QA environment and updating configurations for implementing scripts with pig and sqoop
Involved in defining job flows, managing and reviewing log files
Monitored workload, job performance and capacity planning using Cloudera Manager
Implemented Map Reduce programs on log data to transform into structured way to find user information
Responsible for loading and transforming large sets of structured, semi structured and unstructured data
Developed the MapReduce programs to parse the raw data and store the pre aggregated data in the partitioned tables
Developed several REST web services supporting both XML and JSON to perform task such as demand response management
Developed pages using JSP, JSTL, Spring tags, JQuery, Java Script & Used JQuery to make AJAX calls
Collected the log data from web servers and integrated into HDFS using Flume
Responsible to manage data coming from different sources
Extracted files from HBase and placed into HDFS using Sqoop and pre-process the data for analysis.
Gained experience with NoSQL database.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in several production systems
Normalized Oracle database, conforming to design concepts and best practices
Developed proto-type test screens in HTML and JavaScript
Developed the application by using the Spring MVC framework
Spring IOC being used to inject the parameter values for the Dynamic parameters
Developed JUnit testing framework for Unit level testing
Actively involved in code review and bug fixing for improving the performance
Documented application for its functionality and its enhanced features
Created connection through JDBC and used JDBC statements to call stored procedures

Environment: MapReduce, HBase, Sqoop, Hadoop, JavaScript, HDFS, Pig, Hive, Python, Oozie, Flume, Toad, Putty, Struts, JSP, Spring, Servlets, WebSphere, HTML, XML, UNIX Shell Scripting, Linux, SQL

We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Deerfield, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship