Sr. Hadoop/spark Developer Resume
OklahomA
SUMMARY
- 8+ years of experience in IT in fields of software design, implementation, and development and also support of business applications for health, insurance and telecom industries.
- 4+Years of experience in Linux and Big dataHadoop,HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper
- Having good experience inHadoopframework and related technologies like HDFS, MapReduce, Pig, Hive, HBase, Sqoop and Oozie
- Hands of experience on data extraction, transformation and load in Hive, Pig and HBase
- Experience in the successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- BI tools (Spotfire, Crsytal Reports,Lumira, Tableau) integration withHadoop
- Worked on improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark - SQL, Data Frames, RDD's, Spark YARN.
- Delivery experience on majorHadoopecosystem Components such as Pig, Hive, Spark, Kafka, Elastic Search & HBase and monitoring with Cloudera Manager. Extensive working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java.
- Experience in transferring the data using Informatica tool from AWS S3 to AWSRedshift
- Experienced data pipelines using Kafka andAkkafor handling large terabytes of data.
- Hands on experience onSolrto Index the files directly from HDFS for both Structured and Semi Structured data.
- Strong experience in RDBMS technologies like MySQL, Oracle,Postgresand DB2.
- Training and Knowledge in Mahout, Spark MLlib for use in data classification, regression analysis, recommendation engines and anomaly detection.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Experience in extensive usage of Struts, HTML, CSS, JSP, JQuery, AJAX and JavaScript for interactive pages.
- Involved in configuring and working with Flume to load the data from multiple sources directly intoHDFS.
- Hand on experience and knowledge in Lumira, Regex, Sed, Maven, Log4j, Junit and Ant.
- Hands-on experience with Hortonworks & Cloudera DistributedHadoop(CDH)
- Worked with the ApacheNififlow to perform the conversion of Raw XML data into JSON, AVRO.
- Experience in understanding security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience on predictive intelligence and smooth maintenance in spark streaming is done using Conviva and MLlib from Spark.
- Experience in configuring deployment environment to handle the application usingJettyserver and Web Logic 10 and Postgres database at the back-end.
- Involved in installing cloudera distribution ofHadoopon amazonEC2 Instances.
- Expertise on SPARK engine creating batch jobs with incremental load through HDFS/S3,KINESIS, Sockets, AWS etc.,
- Imported data using Sqoop to load data from MySQL to S3Buckets on regular basis
- Hands on experience in using BI tools like Splunk/Hunk, Tableau.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Experience object oriented programming (OOP) concepts usingPython, C++ and PHP.
- Experience in deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera, MapR and Hortonworks distributions.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Analyzed the SQL scripts and designed the solution to implement usingPYSPARK.
- Experience of MPP databases such as HP Vertica and Impala.
- Hands on experience in the SVN and GitHub.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,Pythonand Scala.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Hands on experience on implementation projects like Agile and Waterfall methodologies.
- Strong Experience on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Experienced in integration ofHadoopcluster with Spark engine to perform BATCH and GRAPHX operations.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Performed analytics in Hive using various files format like JSON, Avro, ORC, andParquet.
- Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server, DB2, Netezza.
- Experience in NoSQL databases like HBase, Cassandra, Redis and MongoDB.
- Experience and hands-on knowledge in Akka andLIFTFramework.
- Experience in using design pattern, Java, JSP, Servlets, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac os and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
PROFESSIONAL EXPERIENCE
Sr. Hadoop/Spark Developer
Confidential, Oklahoma
Responsibilities:
- Experienced in designing and deployment of Hadoop cluster and various Big Data components including HDFS, MapReduce, Hive, Sqoop, Pig, Oozie, Zookeeper in Cloudera distribution
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Responsible for loading Data pipelines from webservers and Teradata using Sqoop with Kafka and Spark Streaming API.
- Hands-on experience with Amazon EC2, Amazon S3 for the computing and storage of data.
- Excellent Understanding of from Amazon Web Services (AWS) services like EC2, S3, EBS, RDS andVPC.
- Data validation between Hive target tables andRedshiftsource tables using automation scripts.
- Designed and published visually rich and intuitive Tableau dashboards and Crystal Reports for executive decision making.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness ofPython into Pig Latin and HQL (HiveQL).
- Extensively worked on Text, ORC, Avro andParquetfile formats and compression techniques like Snappy, Gzip and Zlib.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java andTalend.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and utilizing Hive SerDes likeREGEX, JSON and AVRO.E
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Developing Scripts and Batch Job to schedule variousHadoopProgram.
- Hands on experience in using Map reduce programming model for Batch processing of data stored in HDFS
- Good knowledge in using Splunk for data monitoring and visualization.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and also used Cassandra through Java services.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoopcluster
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
- Experienced on creating multiple kind of Report in Power BI and present it using Story Points.
- Very Good understanding and Working Knowledge of Object Oriented Programming(OOPS), Python andScala.
- Handled importing of data from various data sources, performed transformations using Hive MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
- SedHive Queries in Spark-SQL for analysis and processing the data. Used Scala programming to perform transformations and applying business logic.
- Developed Solr web apps to query and visualize andsolrindexed data from HDFS.
- Written shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written inScala.
- DevelopedPysparkcode to read data from Hive, group the fields and generate XML files
- Enhanced thePysparkcode to write the generated XML files to a directory to zip them to CDAs
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Worked totally in agile methodology and also developed Spark scripts by using Scala shell.
Environment: Hadoop, Hive, Map Reduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, MySQL, Tableau, AWS, EC2, S3, cloudera, power BI, Solr, Pyspark.
Hadoop Developer
Confidential, Lawrenceville
Responsibilities:
- Responsible for planning, organizing, and implementation of complex business solutions, producing deliverables within stipulated time.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Experience in Cloud based services(AWS) to retrieve the data.
- Worked and expertise hands on scala programming for processing real time information using Spark API’s in the cloud environment.
- Using Kafka and Kafka brokers we initiated spark context and processed live streaming information with the help of RDD as is.
- Worked on Spark usingPythonand Spark SQL for faster testing and processing of data.
- Involved in enabling AmazonKinesisfirehoseto capture streaming data directly on to S3 and also Red Shift. It automatically scales to match the throughput of your data and requires no ongoing administration.
- Developed and maintained the continuous integration and deployment systems using Jenkins, ANT,Akkaand MAVEN.
- UsedAkkaas a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
- Involved in building a scalable email system using amazon simple email service, s3 andAkkaactors for handling heavy loads of emails.
- Experience in supporting multi-region AWScloud and Created placement groups to maintain cluster of instances.
- Installed, Configured Talend ETL on single and multi-server environments.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Used Spark API overHortonworkHadoopYARN to perform analytics on data in Hive.
- Hands-on experience with Hortonworktools like Tez and Ambari.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Writing user console page inliftalong with the snippets in Scala. The product is responsible to give access to the user to all their credentials and privileges within the system
- Used Oozie workflow engine to create the workflows and automate the MapReduce, Hive, Pig jobs.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
- Used aLAMBDAEXPRESSION to improve SackEmployees further and avoid the need for a separate class.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Experience in creating hive tables HiveQL.
- Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Hadoop ecosystem components, ETL, Spark, Kafka, Python, Shell Scripting, SQL Talend, Elastic search, Linux- Ubuntu, AWS, Hortonworks, MongoDB, VPC, Lambda, Hive, Zookeeper, Pig, Sqoop, Oozie, Tez, Ambari, YARN, Akka, Jenkins, Kinesis, ANT, Map Reduce.
Hadoop Developer
Confidential, Austin
Responsibilities:
- Hands on experience in loading data from UNIX file system and Teradata to HDFS
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on theHadoopcluster.
- Developed PIG scripts for the analysis of semi structured data.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Working with Eclipse using Maven plugin for Eclipse IDE.
- Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and RestAPI.
- Extracted files fromCouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Developed ETL using Hive, Oozie, shell scripts and Sqoop. Used Scala for coding the components, & Utilized Scala pattern matching in coding
- Supported Data Analysts in running Map Reduce Programs.
- Implemented Name Node backup using NFS. This was done for High availability
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Experience in Pig scripts for sorting, joining and grouping the data
- Experienced with working on Avro Data files using Avro Serialization system.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Unix Shell Scripting, Oozie, ETL, Scala.
Java Developer
Confidential
Responsibilities:
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Designed and implemented the training and reports modules of the application using Servlets, JSP andajax
- Developed XML Web Services using SOAP, WSDL, and UDDI.
- Interact with Business Users and Develop Custom Reports based on the criteria defined.
- Requirement gathering and information collection. Analysis of gathered information so as to prepare a detail work plan and task breakdown structure
- Experience in develop of SDLC life cycle and undergo in all the phases in it.
- Implemented applications usingJava, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
- Developed action Servlets and JSPs for presentation in Struts MVC framework.
- Developed PL/SQL View function in Oracle 9i database for get available date module.
- Used Oracle SQL 4.0 as the database and wrote SQL queries in the DAO Layer.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Used SVN and GitHub as version control tool.
- Experienced in developing web-based applications usingPython, Django, PHP, XML, CSS, HTML, Java Script and JQuery.
- Developed presentation layer using HTML, JSP, Ajax, CSS and JQuery.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
- Created the UI tool - usingJava, XML, XSLT, DHTML and JavaScript
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
- Wrote build & deployment scripts using shell, Perl and ANT scripts
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
- Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD
- Implemented Hibernate in the data access object layer to access and update information in the Oracle10gDatabase
Environment: HTML, Java Script, Ajax, Servlets, JSP, SOAP, SDLC life cycle, Java, Hibernate, Scrum, JIRA, Git Hub, JQuery, CSS, XML, ANT, Tomcat Server, Jasper Reports.
Java Developer
Confidential
Responsibilities:
- Involved in the design, development and deployment of the Application usingJava/J2EE Technologies.
- Developed web components using JSP Servlets, JDBC and Coded JavaScript for AJAX and client side data validation.
- Designed and Developed mappings using different transformations like Source Qualifier, Expression, Lookup (Connected & Unconnected), Aggregator, Router, Rank, Filter and Sequence Generator.
- Imported data from various Sources transformed and loaded into Data Warehouse Targets using Informatica Power Center.
- Made substantial contributions in simplifying the development and maintenance ofETLby creating re-usable Source, Target, Mapplets, and Transformation objects.
- Developed JSP pages using Custom tags and Tiles framework and Struts framework.
- Used different user interface technologies JSP, HTML, CSS, and JavaScript for developing the GUI of the application.
- Skills gained on web-based REST API, SOAP API, Apache for real-time data streaming
- Programmed Oracle SQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back-end processes to create and update staging tables, log and audit tables, and creating primary keys.
- Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
- Developed mappings, sessions and workflows using Informatica Designer and Workflow Manager based on source to target mapping documents to transform and load data into dimension tables.
- Used FTP services to retrieve Flat Files from the external sources.
Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle10/11g, SQL Loader, MS SQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator,