We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY

  • 8+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • 4+years of work experience in ingestion, storage, querying, processing and analysis of BigData with hands on experience in Hadoop Ecosystem development including Mapreduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
  • Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
  • Experienced in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good
  • Understanding of workload management, scalability and distributed platform architectures.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experienced in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experienced in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
  • Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
  • Experienced in extending HIVE and PIG core functionality by using custom UDF's and UDAF's.
  • Debugging MapReduce jobs using Counters and MRUNIT testing.
  • Expertise in writing the Real - time processing application Using spout and bolt in Storm.
  • Experienced in configuring various topologies in storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
  • Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
  • Good understanding on Spark Streaming with Kafka for real-time processing.
  • Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
  • Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Worked on docker based containerized applications.
  • Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
  • Experienced with Testing MapReduce programs using MRUnit, Junit.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM,Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
  • Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MSOffice, PL/SQL Developer, SQL*Plus.
  • Experienced in different application servers like JBoss/Tomcat, WebLogic and IBM WebSphere.
  • Experience in working with Onsite-Offshore model.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark,Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.Hadoop DistributionsCloudera, MapReduce, Hortonworks, IBM Big Insights

Languages: Java, Scala, Python, ruby, SQL, HTML, DHTML, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle Pl/SQL, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

Data analytical tools: R, SAS and MATLAB

ETL Tools: Ab initio, Informatica Power center and Pentaho

Reporting tools: Tableau

PROFESSIONAL EXPERIENCE

Confidential, Chicago IL

Spark/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Worked on Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, Oracle 12c, Linux.

Confidential, Richmond VA

Big data/Spark/Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. me was trained to overtake the responsibilities of
  • A Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools dat uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible for developing, support and maintenance for theETL(Extract, Transform and Load) processes using Informatica Power Center
  • Interacted with product owners & DBA teams to design the project forETLprocess.
  • Develop Mappings and Workflows to generate staging files.
  • Developed various transformations like Source Qualifier, Sorter transformation, Joiner transformation, Update Strategy, Lookup transformation, Expressions and Sequence Generator for loading thedatainto target table.
  • Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues dat impact reporting, business analysis or program execution.
  • Experienced in excel for creating Data validation, Lookup and Pivot tables.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Developed oozie workflow for scheduling & orchestrating the ETL process.
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Experienced with performing CURD operations in HBase.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Actively involved in code review and bug fixing for improving the performance.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Helped the Analytics team with Aster queries using HCatlog.
  • Automated the History and Purge Process.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Kafka, Apache Spark, Storm, Solr, Shell Scripting, HBase, Python, Kerberos, Agile, Zoo Keeper, Maven, Ambari, Horton Works

Confidential, Atlanta GA

Big data/Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce,Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers and partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Sparkand Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented Spark RDD transformations to map business analysis and apply actions on topOf transformations.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internallyin MapReduce way.
  • Developed the MapReduce programs to parse the raw data and store the pre Aggregated datain the partitioned tables.
  • Loaded and transformed large sets of structured, semi structured and unstructured data withMapReduce, Hive and pig.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Designed and developed SSIS (ETL) packages to validate, extract, transform and load data from OLTPsystem to the Data warehouse and Report-Data mart.
  • Implemented Python scripts for writing MapReduce programs using Hadoop Streaming.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per theirquirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take datafrom many sources and ingest into single sink.
  • Worked on implementing advanced procedures like text analytics and processing using the in memory computing capabilities like Apache Spark written in Scala.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Implemented monitoring on all the NiFi flows to get notifications if their is no data flowingthrough the flow more TEMPthan the specific time.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Created NiFi flows to trigger spark jobs and used put email processors to get notifications iftheir are any failures.
  • Worked closely on parallel computing with Spark team to explore RDD in Datastax Cassandra.
  • Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images andIntegrating bulk data into Cassandra file system using MapReduce programs.
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used Cassandra CQL with Java API's to retrieve data from Cassandra tables.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, Sprint planning's.

Environment: Hadoop, Cloudera, HDFS, pig, Hive, Flume, Sqoop, NiFi, AWS Redshift, Python, Spark, Scala, MongoDB, Cassandra, Snowflake, Solr, ZooKeeper, MySQl, Talend, Shell ScriptingLinux Red Hat

Confidential

Big Data Hadoop Developer/Administrator

Responsibilities:

  • Gatheird User requirements and designed technical and functional specifications.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hbasedatabase and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Involved in loading data from LINUX file system to HDFS.
  • Worked on installing cluster, commissioning and decommissioning of DataNode, NameNode recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented a script to transmit Sys Prin information from Oracle toHbase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible to manage data coming from different sources.
  • Involved in loading data from UNIX file system to HDFS.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Clustered coordination services through Zookeeper.
  • Experienced in managing and reviewing Hadoop log files.
  • Job management using Fair Scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, added and removed cluster nodes, cluster monitoring and troubleshooting, managed and reviewed data backups, managed and reviewed Hadoop log files.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updated configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential

Data Analyst

Responsibilities:

  • Acted as a liaison between the IT developers and Business stake holders and was instrumental in resolving conflicts between the management and technical teams.
  • Worked with business users for requirement gathering, understanding intent and defining scope and am responsible for project status updates to Business users.
  • Performing analysis and providing summary for the business questions, initiating proactive investigations into data issues dat impact reporting, business analysis or program execution.
  • Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
  • Involved in performance tuning of slowly running SQL queries and created indexes, constraints and rules on database objects for optimization.
  • Developed functions, views and triggers for automation.
  • Assisted in mining data from the SQL database dat was used in several significant presentations.
  • Assisted in offering support to other personnel who were required to access and analyze the SQL database.
  • Worked onPythonModules and Packages.
  • Hands-on experience inPythonscripting, in web development using Django.
  • UsedPythonscripts to update the content in the database and manipulate file
  • Analyzed variousbackupcompression tools available and made the recommendations.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Extensive experience in Design and Implementation ofPL/SQLStored Procedures, Functions, Packages, Views, Cursors, Ref Cursors, Collections, Records, Object Types, Database Triggers, Exception Handling, Forms, Reports, Table Partitioning.
  • Involved in writing T-SQL programming for implement Stored Procedures and Functions for differenttasks.
  • Responsible for creating Databases, Tables, Index, Unique/Check Constraints Views, Stored Procedures,Triggers, Rules.
  • Optimized the performance of queries by modifying the existing index system and rebuilding indexes.
  • Coordinated project activities between clients and internal groups and information technology, including project portfolio management and project pipeline planning and Worked in close collaboration with the Project Management Office and business users to gather, analyze and document the functional requirements for the project.
  • Responsible for development of workflow analysis, requirement gathering, data governance, data management and data loading.
  • Analyzing and documenting data flow from source systems managed the availability and qualityOf Data.
  • Root cause analysis of data discrepancies between different business system looking at Business rules, data model and provide the analysis to development/bug fix team.
  • Hands on experience writing Queries, Stored Procedures, Functions,PL/SQLPackages and Triggers in Oracle and reports and scripts
  • Evaluated existing practices of storing and handling important financial data for compliance and Ensured corporate compliance with all billing, credit standards and direct responsibility of accounts receivables and supervision of accounts payable.
  • Hands on experience writing Queries, Stored Procedures, Functions,PL/SQLPackages and Triggers in Oracle and reports and scripts
  • Has setup data governance touch points with key teams to ensure data issues were addressed promptly.
  • Responsible for facilitating UAT (User Acceptance Testing), PPV (Post Production Validation) and maintaining Metadata and Data dictionary.
  • Responsible for source data cleansing, analysis and reporting using pivot tables, formulas (v-lookup and others), data validation, conditional formatting, and graph and chart manipulation in Excel.
  • Actively involved in data modeling for the QRM Mortgage Application migration to Teradata and developed the dimensional model.
  • Experience in developingSQL*Loader control programs andPL/SQLvalidation scripts for validatingdatato loaddatafrom staging tables to production tables.
  • Created views for reporting purpose which involves complex SQL queries with sub-queries, inline views, multi table joins, with clause and outer joins as per the functional needs in the Business Requirements Document (BRD).
Environment: Agile, Teradata, Oracle 12c,SQL,PL/SQL, Unix Shell Scripts, Python2.7, MDX/DAX, SAS, PROC SQL, MS Office Tools, MS Project, Windows XP, MDX/DAX, MS Access, Pivot TablesEDUCATIONBachelor of Technology in Information Technology, JNTUH 2009

We'd love your feedback!