Big Data / Spark Developer Resume
Mason, OH
SUMMARY:
- Over 7+ years of total experience as Software Engineer with IT Technologies, 5+ years of Experience as Hadoop Developer with BIG Data / Hadoop Ecosystems and 2 years as Java Developer.
- Experience in installing, configuring and troubleshooting Hadoop ecosystem components like Map Reduce, HDFS, Sqoop, Pig, Flume, Hive, HBase, and Zoo Keeper.
- Experience in upgrading the existing Hadoop cluster to latest releases.
- Experienced in using NFS (network file systems) for Name node metadata backup.
- Experience in using Cloudera Manager 4.0 for installation and management of Hadoop cluster.
- Worked with different flavors of Hadoop distributions, which includes Cloudera (CDH4&5 Distributions) and Hortonworks.
- Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
- Involved in importing Streaming data using Flume to HDFS and good experience in analyzing and cleansing raw data using HiveQL, Pig Latin.
- Experience in Partitioning, Bucketing, Join Optimizations and Query Optimizations in Hive and automating the Hive Queries with the Dynamic Partitioning.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like Hbase and worked on HBase to load and retrieve data for real time processing using Rest API.
- Configured Name node HA on the existing Hadoop cluster using Zookeeper quorum.
- Expertise in writing Shell scripting in UNIX using ksh and bash.
- Experienced in developing and implementing Map Reduce jobs using java to process and perform various analytics on large datasets.
- Implemented a CI/CD pipeline with JENKINS, GITHUB, NEXUS, MAVEN and AWS AMI.
- Good experience in writing Pig Latin scripts and Hive queries.
- Good understanding of Data Structure and Algorithms.
- Good experience on developing of ETL Scripts for Data cleansing and Transformation.
- Experience in Data migration from existing data stores and mainframe NDM (Network Data mover) to Hadoop.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in supporting analysts by administering and configuring HIVE.
- Hands - on programming experience in various technologies like JAVA, J2EE, JSP, Servlets, SQL, JDBC, HTML, XML, Struts, Web Services, SOAP, REST, Eclipse, Visual Studio on Windows, UNIX and AIX.
- Experience writing SQL queries and working with Oracle and My SQL.
- Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
- Experience in preparing deployment packages and deploying to Dev and QA environments and prepare deployment instructions to Production Deployment Team.
- Experience in developing Client-Side Web applications using Core Java and J2EE technologies such as HTML, JSP, jQuery, JDBC, Hibernate and Custom Tags while implementing the client-side validations using JavaScript and Server-side validations using Struts and Spring Validations Framework.
- Team player with excellent analytical, communication and project documentation skills Agile Methodology and Iterative development.
TECHNICAL SKILLS:
Hadoop Ecosystem: MapReduce, Pig, Hive, Oozie, Sqoop, HCatalog, Zookeeper.
Languages: C, C++, Java, XML, Unix Shell Scripting, Oracle SQL and PL/SQL, Pearl, Python 2.7/3.
Java Technologies: JSE, JSP, JDBC, Hibernate.
Databases: MySQL, Teradata, Oracle.
Analytics Tools: RapidMiner, Weka, Apache Mahout.
Cloud Computing: AWS, EC2, S3.
BI Tools: Pentaho, Cognos TM1, Report Studio.
Version Control Tools: SVN, CVS, VSS, PVCS.
Operating Systems: Sun Solaris, RedHat Linux, Windows98/XP/Vista/7/8, UNIX, Linux.
PROFESSIONAL EXPERIENCE:
Confidential - Mason, OH
Big Data / Spark Developer
Responsibilities:
- Primary role building data pipelines and working on advanced procedures like cluster tuning, code optimization and using the in-memory computing capabilities of Spark using Scala as per requirements.
- Working with Project managers, business owners, analyst teams and clients, building database prototypes to validate system requirements and document code, provide progress reports, and perform code review and peer feedback.
- Working on converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Participate in the development of technical/functional requirements and design specification as appropriate and developing the software as required.
- Design, develop, validate and deploy the Talend ETL processes for the DWH team using HADOOP (HIVE) on Hadoop.
- Build data pipeline for different events of ingestion, aggregation and load consumer response data into Hive external tables in HDFS location to serve as feed for several dashboards and Web APIs.
- Develop SQOOP scripts to migrate data from Oracle to Big data Environment.
- Design experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark UDF’s, Spark Data Frames.
- Work with different file formats like CSV, Json, AVRO, text and Pparquet and compression techniques like snappy according to the request of the client.
- Integrate Spark with MongoDB and create Mongo Collections, consumed by API teams.
- Work on cluster tuning and in-memory computing capabilities of Spark using Scala based on the resources available on the cluster.
- Develop Shell Scripts to automate the Jobs before moving to Production in a configured way by passing Parameters.
- Schedule automated jobs on daily basis and weekly basis according to the requirement using Control-M as Scheduler.
- Work on operation controls like job failure notifications, email notifications for failure logs and exceptions.
- Support the project team for successful delivery of the client's business requirements through all the phases of the implementation.
- Experienced in loading and transforming of large sets of structured, semi structured data using Spark
Environment: Hadoop, Spark Scala, Hive, Cloudera, HBase, Sqoop, HDP 2.6, HDFS.
Confidential - Menomonee Falls, WI
Big Data / Spark Developer
Responsibilities:
- Good knowledge and worked on Spark SQL, Spark Core topics such as Resilient Distributed Dataset (RDD) and Data Frames.
- Worked on converting Hive queries into Spark transformations using Spark RDDs.
- Worked on improving Hive queries performance by rewriting in Spark.
- Developed Spark code using java and Spark-SQL for faster testing and Processing of data.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- Migrated all data and tables from Spark 1.2 to Spark 1.4. Created a parallel process for data all ingestions which uses spark 1.4 and then comparing and validating data to 1.2 there by killing processes in 1.2
- Migrated all data and tables from Spark 1.4 to Spark 1.6. Created a parallel process for data all ingestions which uses spark 1.6 and then comparing and validating data to 1.4 there by killing processes in 1.4
- Created Hbase tables to load large sets of structured data and Worked extensively on MapR M7 (HBASE), experience in Row Key and table design.
- Designed HBase tables for time series data. Designed row key to avoid region hotspotting and accommodate desired read access/query patterns, used Single Column filter for fast key search across Hbase regions.
- Developed Utility jars for Voltage file Encryption and Decryption Using Java.
- Performed Data Orchestration for transactional, incremental tables by dividing these into different categories and joining them to form large buckets.
- Worked on solr to elastic search data migration.
- Experience in building stream processing systems using solution such as Storm or spark-streaming and Kafka .
- Created Talend jobs to load data into various Oracle tables and Utilized Oracle stored procedures.
- Hands on experience on Cloudera platform and associated tools which includes Navigator and Cloudera Manager, Workload XM.
Environment: Hadoop, Spark RDD, Hive, MapReduce, Hbase, Pig, Sqoop, HDP 2.6, HDFS, Talend.
Confidential - Jacksonville, FL
Big Data Developer
Responsibilities:
- Installed and configured Hadoop ecosystem like HBase, Flume.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Managed and reviewed Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Kerberos/LDAP skills Falcon, Ranger, Knox, Ambary - Deep knowledge w/practical exp.
- Experience using Hortonworks platform and their eco systems. Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hive and Flume.
- Experience on Apache Knox Gateway security for Hadoop Clusters.
- Developed Pig scripts to load data from files to Hbase.
- Developed Hive scripts to pull data from Data Lake to our tenant.
- Proficient with UNIX shell scripting.
- Involved in design discussions for the ingestion process
- Production support for ingestion framework.
- Developed Talend jobs to identify the gaps in the data and reload the data to clear the gaps.
- Leading a team of 4 and tracking the defects to closure.
- Developed Talend jobs to configure RabbitMQ for batch load on data.
- Configured Splunk dashboard to view ingestion details.
- Developed Shell and Python scripts to automate the jobs.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Hive.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments
- Involved in installing Hadoop Ecosystem components.
- Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
- Responsible to manage data coming from different sources.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Building massively scalable multi-threaded applications for bulk data processing primarily with Apache Spark and PIG on Hadoop.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
Environment: Hadoop Distributed File System (HDFS), MapReduce, Tez, Sqoop, HDP 2.6, Talend, Hive, Pig, TAC, HBase, Splunk, RabbitMQ, SVN.
Confidential - Phoenix, AZ
Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, Cassandra and Sqoop.
- Implemented High Availability Name Nodes using Quorum Journal Managers and Zookeeper Failover Controllers.
- Managed 350+ Nodes HDP 2.3 cluster with 4 peta bytes of data using Ambari 2.0 and Linux Cent OS 7.
- Familiar with Hadoop Security involving LDAP, Kerberos, Ranger.
- Strong experience using Ambary administering large Hadoop clusters > 100
- After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and Kafka.
- Configure LDAP User Management Access
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System)
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Set up Kerberos locally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis of Kerberos enablement.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Worked on implementation of SSL /TLS implementation.
- Configuration of SSL and trouble shooting in Hue.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera works
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Enabled Kerberos for authorization and authentication.
- Enabled HA for NameNode, Resource Manager, Yarn Configuration and Hive Megastore.
- Configured Journal nodes and Zookeeper Services for the cluster using Cloudera.
- Monitored Hadoop cluster job performance and capacity planning.
- Monitored and reviewed Hadoop log files.
- Performed Cloudera Manager and CDH upgrades.
- Taking backup of Critical data, Hive data and creating snapshots.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster
- Monitoring and troubleshooting, and review Hadoop log files.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extraction data using Flume and Import/Export to HDFS/RDMS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Good Knowledge of NoSQL database like HBase.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive.
- Performance tuning of Impala jobs and resource management in cluster.
Environment: HDFS, Mapreduce (MR1), Pig, Hive, Oozie, Sqoop, Cassandra, AWS, Talend, Java, Unix-Shell Scripting.
Confidential - Atlanta, GA
Big Data/Hadoop Developer
Responsibilities:
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues
- Adding/installation of new components and removal of them through Cloudera Manager
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades
- Worked with cloud services like AZURE and involved in ETL, Data Integration and Migration.
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Responsible to Design & Develop the Business components using Java.
- Creation of Java classes and interfaces to implement the system.
- Designed, built, and deployed a multitude applications utilizing almost all the AZURE stack, focusing on high-availability, fault tolerance, and auto-scaling
- Designed and developed automation test scripts using Python.
- Azure Cloud Infrastructure design and implementation utilizing ARM templates.
- Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub-workflows.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using sqoop.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Data analysis in running Hive queries.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's on Java.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
- Used Amazon EMR for map reduction jobs and test locally using Jenkins.
- Creating and managing Azure Web-Apps and providing the access permission to Azure AD users.
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Experience in configuring the Storm in loading the data from MYSQL to HBASE using JMS.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Experience in managing and reviewing Hadoop log files.
Environment: HDFS, Map Reduce, Hive, Hue, Pig, AZURE, Flume, Oozie, Sqoop, CDH5, Apache Hadoop, Spark, Python, R programming, Qlik, HortonWorks, Ambari, Cloudera Manager, Red Hat, Java, MySQL and Oracle.
Confidential - San Antonio
Java Developer
Responsibilities:
- Developed Web application using spring, Spring IOC, Spring Annotations, Spring MVC, Spring Transactions, Hibernate, SQL, and IBM Web Sphere.
- Development of the service layer using Java/J2EE.
- Created internal routes using REST web service with spring which can accept and send objects in JSON format.
- Very good implementation experience of Object Oriented concepts, Multithreading and Java/Scala.
- Involved in multi-tiered J2EE design utilizing Spring IOC architecture and Hibernate.
- Experienced in developing web services and worked with Web Sphere Application Server.
- Involved in Analysis, Design and Implementation of Business User Requirements.
- Designed table-less layouts using CSS and appropriate HTML tags as per W3C standards.
- Created optimized graphic websites and application interfaces using HTML, CSS, and spring framework.
- Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Informatica, Java and database views using Scala.
- Extensively worked on AJAX to implement front end /user interface features in the application.
- Developed CSS style sheets to give gradient effects. Developed page layouts, navigation and icons.
- Used Bootstrap in combination with Angular JS to develop this website as a responsive website.
- Created Custom filters and directives to process the data or to render a reusable DOM.
- Used JavaScript extensively for validation, DOM manipulation etc.
- Used GitHub as the version control tool.
- Worked with build tools like Jenkins to deploy application.
Environment: Spring, Hibernate, JMS, SOAP web service client (using JAX-WS), Restful Web Services Client (using JAX-RS), Angular JS, Bootstrap, HTML, CSS, AJAX, Scala, Oracle, SQL, Oracle, Eclipse, GIT, Jenkins, IBM Web Sphere.
Confidential - Minnesota
Java Developer
Responsibilities:
- Involved in the requirements gathering, software change requests and working on detailed analysis and also Involved in the design, and development, testing phases of Software Development life cycle.
- Followed Agile methodology for application development and Worked on HTML, JavaScript and JSP for development UI.
- Developed REST web services using Jersey library also Used JAXB for marshalling and unmarshalling
- Involved in consuming SOAP based web services and also Used Soap UI for web service testing by sending an SOAP request.
- Wrote Spring XML Configuration files for various modules and Configured logging using log4j.
- Wrote SQL queries to query data from database.
- Used IBM RAD IDE for application development and Deployed the application on WebSphere Portal server.
- Developed unit test cases using JUnit and JMock and Maven for build.
- Performed Defect Tracking for defect traceability by using tools such as HP Quality Centre.
- Used SVN for configuration management.
Environment: Java, JDK1.6, Servlets, JSP, JSF, Java Script, Web Services, SOAP, REST, spring, Hibernate, Portlets, AJAX and XML, XSD, JUnit, JMock, RAD, WebSphere, HP Quality Centre.