Sr. Hadoop Developer/ Big Data Resume Baltimore, MD - Hire IT People

SUMMARY

Around 9 Years of IT industry experience with 6 years of experience in dealing with ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
Experience in performing in - memory data processing and real time streaming analytics using Apache Spark with Scala, Java and Python.
Developed applications for Distributed Environment using Hadoop, MapReduce and Python.
Developed MapReduce jobs to automate transfer of data from HBase.
Developing and Maintenance the Web Applications using the Web Server Tomcat.
Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
Good experience working with Hortonworks Distribution and Cloudera Distribution.
Very good understanding/knowledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
Experience in data extraction and transformation using MapReduce jobs.
Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
Performed data analysis using Hive and Pig.
Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
Strong understanding of NoSql databases like HBase, MongoDB & Cassandra.
Experience in working on various Hadoop data access components like MapReduce, Pig, Hive, HBase, Spark and Kafka.
Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Intensive working experience with Amazon Web Services(AWS) using S3 for storage, EC2 for computing and RDS, EBS.
Excellent programming skills at higher level of abstraction using SCALA and JAVA.
Well versed with job workflow scheduling and monitoring tools like Oozie.
Loaded streaming log data from various web servers into HDFS using Flume.
Experience in using Sqoop, Oozie and Cloudera Manager.
Experience on Source control repositories like SVN, CVS and GIT.
Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
Hands on experience in application development using RDBMS, and Linux shell scripting.
Have experience with working on Amazon EMR and EC2 Spot instances.
Solid understanding of relational database concepts.
Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Adequate knowledge and working experience in Agile & Waterfall methodologies.
Support development, testing, and operations teams during new system deployments.
Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
Possess excellent communication and analytical skills along with a can - do attitude.

TECHNICAL SKILLS

Programming languages: C, Java, Python, Scala, SQL

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, HBaseFlume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Web/Application Servers: Apache Tomcat, Sun Java Application Server

Tools: IntelliJ, Eclipse, Net Beans, Nagios, Ganglia, Maven

Scripting: BASH, JavaScript

Version Controls: GIT, SVN

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer/ Big Data

Confidential - Baltimore, MD

Responsibilities:

Analyzed SQL scripts and designed the solution to implement using PySpark.
Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Developed and Configured Kafka brokers to pipeline server logs data into Spark Streaming.
Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive.
Worked on MySQL database to retrieve information from storage using Python.
Experienced in implementing and working on the python code using shell scripting.
Responsible for developing data pipeline Apache NIFI and Spark/Scala to extract the data from vendor and store in HDFS and REDSHIFT.
Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
Worked on integrating Python with Web development tools for developing Web Services in Python using XML, JSON.
Optimizing the poorly written Spark/Scala jobs by monitoring the YARN UI to use more parallelism.
Working on python's UNITTEST and PYTEST frameworks.
Used Python/ HTML / CSS to help the team implement dozens of new features in a massively scaled Google App Engine web application.
Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, Oozie on the Hadoop cluster are installed and configured.
Strong Experience with Python automation in automating Rest API and UI automation using Selenium web driver using Python.
Having good knowledge on an Apache Cassandra for storing the data in a Cluster.
Developed and analyzed the SQL scripts and designed the solution to implement using Pyspark
Built an Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest financial data into HDFS.
Developed a fully automated continuous integration system using Git, Jenkins.
Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Designed and developed scalable Azure APIs using Flask web framework in Python and Integrated with Azure API Management, Logical Apps and other Azure services.
Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark.
Perform data analysis on NoSQL databases such as HBase and Cassandra.
Providing a responsive, AJAX-test driven design using JavaScript libraries such as JavaScript, jQuery, AngularJS and Bootstrap.js - Using Subversion for version control.
Developed and implemented core API services using Python and Spark (PySpark).
Used Pyspark to process and analyze the data.
Written test cases using PyUnit test framework and Selenium Automation testing for better manipulation of test scripts.
Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
Conducted systems design, feasibility on recommend cost-effective cloud solutions such as Amazon Web Services (AWS), Microsoft Azure and Rackspace.
Worked on Micro services for Continuous Delivery environment using Docker and Jenkins.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
Developed and deployed data pipeline in cloud such as AWS and GCP
Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Done batch processing of data sources using Apache Spark, Elastic search.
Implemented a Python-based distributed random forest via Python streaming.
Worked on the MySQL migration project to make the system completely independent of the database being used.

Environment: Python, Mapreduce, Spark, Hadoop, HBase, Scala, Kafka, Hive, Pig, Sqoop, RBAC, ACL, Kafka, Pandas, Docker, ReactJS, Google APIs, SOAP, REST, IntelliJ, Azure APIs, Shell Scripting, Selenium, AWS.

Sr. Hadoop Developer/ Big Data

Confidential - Devner, CO

Responsibilities:

Involved in installation, configuration and maintenance of Hadoop clusters for application development with Cloudera distribution.
Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging systems Kafka through persistence of data into HDFS with Apache Spark using Scala.
Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
In the framework we just need to mention the table names, schemas and location of source file/ Sqoop parameters etc. and the framework will generate the entire code which includes Workdlow.xml.
Performed advanced operations like text analytics and processing, using in-memory computing capabilities of Spark using Scala.
Experience in query data using Spark SQL on Spark to implement Spark RDD’S in Scala.
Experienced in working with different scripting technologies like Python, UNIX shell scripts.
Performed POC on writing the spark applications in Scala, Python and R programming language.
Worked on Partitioning, Bucketing, Parallel execution, Map side Joins for optimization of necessary hive queries.
Performed Hive QL to create Hive tables and to write Hive queries to perform the data analysis.
Experience in collecting log data from web servers and pushed to HDFS using Flume and NoSql database Cassandra.
Used Oozie workflow to Manage and scheduling Jobs on a Hadoop Cluster and used Zookeeper for cluster coordination services.
Started using Apache NiFi to copy the data from local file system to HDP.
Implemented best offer logic using Pig scripts and Pig UDFs.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
Used NIFI for the transformation of data from different components of Big data ecosystem.
Worked on different data sources like Oracle, Netezza, MySQL, Flat files etc. and experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
Developed Python scripts to update content in the database and manipulate files.
Used Qlik sense to build customized interactive reports, worksheets, and dashboards.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Developed and implemented core API services using Python and Spark (PySpark).
Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
Involved in managing and organizing developers with regular code review sessions by utilizing Agile and Scrum Methodologies. Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Experience in implementing Spark RDD transformations, actions, data frames, case classes to required data by using Spark core.
Migrated the computational code in hql toPySpark.
Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
Provide support data analysts in running Pig and Hive queries.
Involved in HiveQL and Involved in Pig Latin.
Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.
Implemented Jira for bug tracking and Bit-bucket to code and code review.
Performed Data Migration to GCP
Implemented apache airflow DAG to find popular items in Redshift and ingest in the main PostgreSQL via a web service call.
Implemented Spark applications in data processing project to handle data from various sources and creating DStreams, Data frames on input data which we get from streaming service like Kafka.

Environment: Map Reduce, HDFS, Hive, Spark, Spark-SQL, Sqoop, IntelliJ, Apache Kafka, Java 7, Cassandra, Scala, Apache Pig, Apache Hive, Oozie, Nifi, Linux, AWS EC2, Agile development, Oracle 11g/10g, UNIX Shell scripting, Ambari, TezEclipse, Qlik sense and Cloudera.

Hadoop Developer/ Big Data

Confidential - New York, NY

Responsibilities:

Used Cassandra Query Language to design Cassandra database and tables with various configuration options.
Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
Involved in the review of functional and non-functional requirements.
Practical experience in developing Spark applications in Eclipse with Maven.
Strong understanding of Spark real time streaming and SparkSQL.
Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
Loading data from external data sources like MySQL and Cassandra for Spark applications.
Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
Firm understanding of optimizations and performance-tuning practices while working with Spark.
Good knowledge on compression and serialization to improve performance in Spark applications
Performed interactive querying using SparkSQL.
Involved in designing Kafka for multi data center cluster and monitoring it.
Responsible for importing real time data to pull the data from sources to Kafka clusters.
Develop predictive analytic using Apache Spark Scala APIs.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE.AWS
Practical knowledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
Good knowledge on building predictive models focusing on customer service using R programming.
Experience in reviewing and managing Hadoop log files.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Experience in building batch and streaming applications with Apache Spark and Python.
Used the libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
Debug CQL queries and implement performance enhancement practices.
Strong knowledge on Apache Oozie for scheduling the tasks.
Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
Experience in configuring Kafka brokers, consumers and producers for optimal performance.
Knowledge of creating Apache Kafka consumers and producers in Java.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
Experience with GIT for version control system.
Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
Understanding technical specifications and documenting technical design documents.
Strong skills in agile development and Test-Driven development.
Have practical knowledge on implementing Internet of Things (IoT)

Environment: Hadoop Cloudera Distribution (CDH4), Java 7, Hadoop, Spark, SparkSQL, Mlib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig, Apache Hive, HDFS, Sqoop, IntelliJ, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, AWS EC2, GIT, Ambari, TezEclipse, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Hadoop Developer / Big Data

Confidential - Newport Beach, CA

Responsibilities:

Developed Spark scripts by using Scala as per the requirement.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Designed and implemented Incremental Imports into Hive tables.
Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
Involved in defining job flows, managing and reviewing log files.
Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.
Supported Map Reduce Programs those are running on the cluster.
As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD'
Developed Spark scripts by using Scala shell commands as per the requirement.
Imported Bulk Data into HBase Using Map Reduce programs.
Perform analytics on Time Series Data exists in HBase using HBase API.
Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming.
Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
Wrote multiple java programs to pull data from HBase.
Involved with File Processing using Pig Latin.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
Worked on debugging, performance tuning of Hive & Pig Jobs
Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
Worked with Spark Ecosystem using SCALA and HIVE Queries on different data formats like Text file and parquet.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Java, Hadoop, Map Reduce, Pig, Hive, Linux, Sqoop, Flume, Eclipse, AWS EC2, and Cloudera CDH 4.

Java/Hadoop Developer

Confidential

Responsibilities:

Processed data into HDFS by developing solutions.
Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Created Hive tables and involved in data loading and writing Hive UDFs.
Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.
Involved in providing inputs for estimate preparation for the new proposal.
Worked extensively with HIVE DDLs and Hive Query language (HQLs).
Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
Created components like Hive UDFs for missing functionality in HIVE for analytics.
Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
Used different file formats like Text files, Sequence Files, Avro.
Cluster co-ordination services through Zookeeper.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Kafka, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.

Software Engineer

Confidential

Responsibilities:

Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
A key member of the team and playing a key role in articulating the Design requirements for the Development of Automated tools that perform error free Configuration.
Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for the migration Activity.
Developed JSP pages, Servlets and HTML pages as per requirement.
Developed the necessary Java Beans, PL/SQL procedures for the implementation of business rules.
Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for the Presentation Tier.
Developed JSP pages and client-side validation by java script tags.
Developed an own realm for Apache Tomcat Server for authenticating the users.
Developed front end controller in Servlet to handle all the requests.
Developed the web interface using JSP and developed struts action classes.
Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing the solutions build on build basis.
Coding using Java, Java Script and HTML.
Used JDBC to provide database connectivity to database tables in Oracle.
Used WebSphere Application Server for application deployment.
Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.

Java Developer

Confidential

Responsibilities:

Communicate with Clients for Requirements Gathering, Explaining the requirements to Team Members
Analyzing the Requirements and Designing Screen Prototypes.
Involved in Project Documentation.
Involved in creation of Basic DB Architecture for the application.
Involved in adding solution to VSS.
Designing & Development of Screens.
Coded JS functions for client validations.
Created user Controls for reusability.
Creation of Tables, Views, Packages, Sequences, Functions for all the modules of the project.
Developed Crystal Reports.
Integrating the functionality of all modules.
Involved in deploying the application.
Unit testing & integration testing.
Designing test plan, test cases and checking the validation.
Test whether the application meets the business requirements.
Implementation of the system at client Location.
Giving Training to Application users, interacting with the client, understanding the change requests if any from client.
Responsible for Immediate Error Resolving.

Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/ Big Data Resume

Baltimore, MD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship