Sr. Hadoop/spark Developer Resume
Wilmington, DE
SUMMARY
- 8+ years of experience in the field of IT, with fine emphasis in designing and implementing statistically significant analytic solutions on Hadoop and Java based enterprise applications.
- 4 years of implementation and extensive working experience in writingHadoopJobs for analyzing data using wide array of tools in Big Data like Hive, Pig, Flume, Oozie, Sqoop, Kafka, ZooKeeper and HBase.
- An accomplished Hadoop/Spark developer experienced in ingestion, storage, querying, processing and analysis of big data.
- Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera(CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon’s EMR (Elastic MapReduce).
- Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Experienced in implementing scheduler using Oozie, Airflow, Crontab and Shell scripts.
- PreparingJILscripts for scheduling the workflows usingAutosysand automated jobs with Oozie.
- Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark.
- Hands - on-Experience in Analytical tools like SAS, R, RStudio, Python, NumPy, Sci-Kit learn, Spark MLLib, Neo4J andGraphDB
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in performing real time analytics on big data using HBase and Cassandra.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Developed customized UDFs and UDAFs in java to extend Pig and Hive core functionality.
- Experience in validating and cleansing the data using Pig Latin operations and UDFs. Hands-on experience in developing Pig MACROS.
- Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
- Worked on GUI Based Hive Interaction tools like Hue, Karmasphere for querying the data.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Working knowledge in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
- Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers, etc.
- Involved in maintaining the Big Data servers using Ganglia and Nagios.
- Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Expert in CodingTeradataSQL,TeradataStored Procedures, Macros and Triggers.
- Installed, Configured and Administered PostgreSQL Databases in the Dev, Staging and Prod environment.
- Extracted data from various data source including OLE DB, Excel, Flat files and XML.
- Experienced in using build tools like Ant, SBT, Log4j, Maven to build and deploy applications into the server.
- Experienced in migrating data from different sources using PUB-SUB model in Redis, and Kafka producers, consumers and preprocess data using Storm topologies.
- Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions. Good understanding of MPP databases such as HP Vertica.
- Had competency in using Chef, Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, Hudson Bambino for automated builds.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Built AWS secured solutions by creating VPC with public and private subnets.
- Proficient in developing, deploying and managing the SOLR from development to production.
- Experience in Enterprise search using SOLR to implement full text search with advanced text analysis, faceted search, filtering using advanced features like dismax, extended dismax and grouping.
- Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
- Working experience on Test Data Management tools HP Quality Center, HPALM, Load Runner, QTP and Selenium.
- Worked on ELK stack like Elastic search, Logstash, Kibana for log management.
- Experience in managing and reviewing Hadoop log files.
- Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
- Experience in Software Design, Development and Implementation of Client/Server Web based Applications using JSTL, jQuery, JavaScript, Java Beans, JDBC, Struts, PL/SQL, SQL, HTML, CSS, PHP, XML, AJAX and had a bird’s eye view on React Java Script Library.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, PVCS, SVN.
- Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat
- Experience with best practices of Web services development and Integration (both REST andSOAP).
- Experianced working in Test Driven and Behavioural Driven Development.
- Experience in automated scripts using Unix shell scripting to perform database activities.
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
- Working experience with Linux lineup like Redhat and CentOS.
- Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
- Good analytical, communication, problem solving skills and adore learning new technical, functional skills.
TECHNICAL SKILLS
Big Data Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie, MongoDB, Cassandra, Mahout, Puppet, Avro, Parquet, Snappy, Falcon.
NO SQL Databases: HBase, Cassandra, MongoDB, Amazon DynamoDB, Redis
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache.
Languages: C, C++, Java, Scala, Python, XML, XHTML, HTML, AJAX, CSS, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
Java & J2EE Technologies: Core Java, JAVA 7, JAVA 8, Hibernate, Spring framework, JSP, Servlets, Java Beans, JDBC, EJB 3.0, Java Sockets & Java Scripts, jQuery, JSF, Prime Faces, SOAP, XSLT and DHTML Messaging Services JMS, MQ Series, MDB, J2EE MVC, Struts 2.1, Spring 3.2, MVC, Spring Web, JUnit, MR-Unit.
Source Code Control: Github, CVS, SVN, Clearcase
Application Servers: WebSphere, WebLogic, JBoss, Tomcat
Cloud Computing Tools: Amazon AWS, (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch, Cloud Front), Microsoft Azure
Databases: Teradata, Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2
DB languages: MySQL, PL/SQL, PostgreSQL & Oracle
Build Tools: Jenkins, Maven, ANT, Log4j
Business Intelligence Tools: Tableau, Splunk
Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, Toad, NetBeans
ETL Tools: Talend, Pentaho, Informatica, Ab Initio
Development Methodologies: Agile, Scrum, Waterfall, V model, Spiral
PROFESSIONAL EXPERIENCE
Confidential, Wilmington, DE
Sr. Hadoop/Spark Developer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Provide technical expertise in delivering backend orchestration for executing bank processes.
- Responsible for building scalable distributed data solutions using ApacheHadoopandSpark.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
- Involved in writing customized Java Springboot framework to read and parse incoming messaging from Kafka messaging topics for both batch and realtime data on the System of Records(SOR) cluster.
- Configuring and developing synchronous and asynchronous Kafka producers, High level and low-level consumer API’s, topics, brokers using java.
- Developed a NiFi Workflow to pick up the data from mainframes, SFTP servers, IBM CDC replication engine and send that to Kafka broker.
- Used Apache Spark with ELK cluster for obtaining some specific visualization which require more complex data processing/querying.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Implemented Spark SQL for faster processing of data and handle Skew Data for real time analysis in Spark.
- Migrated Flume with Spark for real time data and Developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi and importing data using Nifi tool from Linux servers.
- Performed ad-hoc queries on structured data using Hive QL and used Partitioning, bucketing techniques and joins with Hive for faster data access.
- Used Dbeaver as SQL to see oversee the sample data and oversee the structure of data in the hive data base.
- Worked with ELK Stack cluster for importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
- Diagnosed Cassandra problems by setting Log4J Debug mode for detailed tracing and analyzing Cassandra deferred reads and writes.
- Configured internode communication between Cassandra nodes and client using SSL encryption.
- Worked on tuning Bloom filters and configured compaction strategy based on the use case.
- Closely associated with Cassandra DBA in implementing Cassandra data model in application environment to ensure solution is not affecting existing business as usual
- Implemented Unit/Load testing using ScalaTest, JUnit/JMeter/Blazemeter tools.
- Involved in reading uncompressed data formats like Gzip, Avro, Parquet and compressed the same according to the business logic by writing generic code.
- Co-ordinated with different teams for implementing App logging in Splunk and creating dashboards.
- App monitoring on Dynatrace and WilyIntroscope for our microservices running on custom cloud (GAIA).
- Experianced in GAIA custom cloud in JPMorgan environment to create kafka services by parsing them in a json string and developed other modules such as lightswitch and Blue/Green deployment in cloud.
- Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)/GAIA and on private cloud infrastructure - Open stack cloud platform.
- Hands on experience to deploy code in production server by following organization standards (Devops model - Jenkins, Jules, Bitbucket, Changeman tools).
- Performing code analysis through tool sets such as SONAR, Fortify(SSAP), Blackduck in addition to risks and security initiatives such as Pentests.
- Involved in designing batches to automate execution in CA Uniter, CTRL-M, Oozie and Autosys.
- Following complete Agile methodology starting from product backlog, sprint backlog, spring planning, sprint retrospectives, user stories, assigning storypoints etc.
- Conduct code reviews for team members to ensure proper test coverage and consistent code standards.
Environment: Apache Kafka, Spark, Spark-Streaming, Spark SQL, AWS EMR, CDH, HDFS, Hive, Java (JDK SE 6, 7), Scala, Shell scripting, Maven, IBM Infosphere, Jules/Jenkins, Groovy, Lombok, IntelliJ, Splunk, Dynatrace, Oracle, BitBucket, Jira, Nifi, MySQL, Soap, Cassandra and Agile Methodologies.
Confidential, Hudson Yards, NYC
Sr. Big Data Engineer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Responsible for building scalable distributed data solutions using ApacheHadoopandSpark.
- Deployed ScalableHadoopcluster onAWSusingS3as underlying file system forHadoop.
- Worked in the cluster disaster recovery plan for theHadoopcluster by implementing the cluster data backup inAmazonS3buckets.
- DevelopedSparkscripts by using Scala IDE as per the bussiness requirement.
- Collected the JSON data from HTTP Source and developedSparkAPIs that helps to do inserts and updates in Hive tables.
- DevelopedSparkscripts to import large files from Amazon S3 buckets and Imported the data from different sources like HDFS/HBase into SparkRDD.
- UsedSpark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kinesis in near real time and Persists into Cassandra.
- Worked with xml's extracting tag information using xpaths andScalaXML libraries from compressed blob datatypes.
- Written several Restful API's in scala functional language to implement the functionality defined.
- Involved in making changes in spark API when migrating from spark 1.6 to spark 2.2.0.
- UsedSparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
- Worked on the integration ofKafka messagingservice for near live stream processing.
- Partitioning data streams usingKafka, designed and configuredKafkacluster to accommodate heavy throughput of 1 million messages per second. UsedKafkaproducer API's to produce messages.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive and then loading data into HDFS.
- Built and maintained scalable data pipelines using theHadoopecosystem and other open source components like Hive and HBase.
- Performed a POC on HBase observer to improve the performance of persisting tables in HBase.
- Captured data from existing databases that provide SQL interfaces using Sqoop.
- Implemented Sqooping from Oracle toHadoopand load back in parquet format.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into theHadoopDistributed File System(HDFS).
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Configured and Implemented Jenkins,Mavenand Nexus for continuous integration.
- Worked with variousAWSComponents such asEC2,S3, IAM, VPC, RDS, Route 53, SNS and SQS.
- Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Experienced working in Dev, staging & prod environment.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
- Actively involved in code review and bug fixing for improving the performance.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, CDH, HDFS, Hive, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Maven, Jenkins, Eclipse, Oracle, BitBucket, Oozie, MySQL, Soap, Cassandra.
Confidential, MI
Sr. Hadoop/Spark Developer
Responsibilities:
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.
- Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
- Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Implemented Elastic Search on Hive data warehouse platform.
- Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
- Used the SparkDataStax Cassandra Connector to load data to and from Cassandra.
- Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
- Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.
- Hands on experience on Cloudera Hue to import data on to the Graphical User Interface.
- Worked with Log4j framework for logging debug, info & error data.
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.
- Developed Custom Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing the data.
- Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
- Implemented ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
- Well versed on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Used Amazon DynamoDB to gather and track the event based metrics.
- Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration.
- Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Modified ANT Scripts to build the JAR's, Class files, WAR files and EAR files.
- Developed application using Eclipse and used build and deploy tool asMaven.
- Generated various kinds of reports using Power BI and Tableau based on Client specification.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, CDH, HDFS, Hive, Pig, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, Cassandra and Agile Methodologies.
Confidential, Cleveland, OH
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Experience in upgrading Cloudera hadoop cluster from 5.3.8 to 5.8.0 and 5.8.0 to 5.8.2.
- Hands-on experience on all hadoop ecosystems (HDFS, YARN, Map Reduce, Hive, Spark, Flume, Oozie, Zookeeper, Spark, Impala, HBase and Sqoop) through Cloudera manager.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
- Worked on implementing Spark Framework, a Java based Web Frame work.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Written java code to format XML documents, uploaded them to Solr server for indexing.
- Experienced on Apache Solr for indexing and load balanced querying to search for specific data in larger datasets and implemented Near Real Time Solr index on Hbase and HDFS.
- Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from the event payload in flume.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system(LFS) to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Installed, Configured TalendETL on single and multi-server environments.
- Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
- Developed REST APIs using Java, Play framework and Akka.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search.
- Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
- Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
- Written and Implemented TeradataFast load, Multiload and Bteq scripts, DML and DDL.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experienced knowledge over designing Restful services using java based API’s like JERSEY.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Experienced in using agile approaches including Test-Driven Development, Extreme Programming, and Agile Scrum.
Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Oracle, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, SBT, Akka, Linux-Ubuntu, Scala,Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies
Confidential, Mattoon, IL
Hadoop Developer
Responsibilities:
- Experienced in migrating and transforming of large sets of Structured, semi structured and Unstructured RAW data from HBase through Sqoop and placed in HDFS for further processing.
- Working with Cloudera Support Team to Fine Tune Cluster.
- Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing.
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other codec file formats
- Written Java program to retrieve data from HDFS and providing it to REST Services.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code
- Created HBase tables, used HBase sinks and loaded data into them to perform analytics using Tableau.
- Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Created multiple Hive tables, running hive queries in those data, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Experienced in running batch processes using Pig Latin Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
- Hands on experience in Developing optimal strategies for distributing the web log data over the cluster, importing and exporting of stored web log data into HDFS and Hive using Scoop.
- Developed several REST web services which produces both XML and JSON to perform tasks, leveraged by both web and mobile applications.
- Developed Unit test cases for Hadoop M-R jobs and driver classes with MR Testing library.
- Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
- Managed and scheduled several jobs to run over a time on Hadoop cluster using oozie.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Used MAVEN for building jar files of MapReduce programs and deployed to cluster.
- Involved in final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Worked on various compression techniques like GZIP and LZO.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
- Installed Qlik Sense Desktop 2.x and developed applications for users and made reports using Qlik view.
- Configured different Qlik Sense roles and attribute based access control.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Helped in design of Scalable Big Data Clusters and solutions and involved in defect meetings.
- Followed Agile Methodology for entire project and supported testing teams.
Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, ANT 1.6, Elastic Search, Hive, Pig, Oozie, Java (jdk 1.5), JSON, Eclipse, Qlik view, Qlik Sense, Oracle Database, Jenkins, Maven, Sqoop.
Confidential
Java Developer
Responsibilities:
- Developing rules based on different state policy using Spring MVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, Log4j
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design, development and testing.
- Generated the use case diagrams, Activity diagrams, Class diagrams and Sequence Diagrams in the design phase using Star UML tool.
- Worked on the agile methodology basis in the project.
- Used Maven as build tool and deploying the application.
- Developed the User Interface using Springframework, JQuery and Ajax.
- Used Spring framework AOP features and JDBC module features to persist the data to the database for few applications. Also used the Spring IOC feature to get hibernate session factory and resolve other bean dependencies.
- Involved in SSH key hashing and SFTP transfer of files.
- Extensively worked on Apache and Apache libraries for developing custom web services.
- Developed the persistence layer using Hibernate Framework by configuring the mappings in hibernate mapping files and created DAO and PO.
- Developed various Java beans for performance of business processes and effectively involved in Impact analysis and Developed test cases using Junit and TestDriven Development.
- Developed application service components and configured beans using Spring IOC, creation of Hibernate mapping files and generation of database schema.
- Created RESTful web services interface to Java-based runtime engine and accounts.
- Done thorough code walk through for the team members to check the functional coverage and coding standards.
- Actively involved in writing SQL using SQL query builder.
- Actively used the defect tracking tool JIRA to create and track the defects during QA phase of the project.
- Used Tortoise SVN to maintain the version of the files and took the responsibility to do the code merges from branch to trunk and creating new branch when new feature implementation starts.
- Used DAO pattern to retrieve the data from database.
- Worked with WebSphere application server that handles various requests from Client.
Environment: Java/J2EE, JSP, XML, Spring Framework, Hibernate, Eclipse(IDE), Micro Services, Java Script, Struts, Tiles, Ant, SQL, PL/SQL, Oracle, Windows, UNIX, Soap, Jasper reports.