Sr. Hadoop/spark Developer Resume
ColoradO
SUMMARY
- 8+ years of experience in IT in fields of software design, implementation, and development and also support of business applications for health, insurance and telecom industries.
- 4+Years of experience in Linux and Big dataHadoop,HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper
- Hands of experience on data extraction, transformation and load in Hive, Pig and HBase
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Designed AWS Cloud Formation templates to createVPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- BI tools (Spotfire, Crsytal Reports,Lumira, Tableau) integration withHadoop
- Strong experience in designing and developing variousDataMart andDataWarehouseapplications using custom ETL Frameworks developed in Java and Python.
- Worked on improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark - SQL, Data Frames, RDD's, Spark YARN.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java.
- Experience in transferring the data using Informatica tool from AWS S3 to AWSRedshift
- Experienced data pipelines using Kafka andAkkafor handling large terabytes of data.
- Hands on experience onSolrto Index the files directly from HDFS for both Structured and Semi Structured data.
- Strong experience in RDBMS technologies like MySQL, Oracle,Postgresand DB2.
- Training and Knowledge in Mahout, Spark MLlib for use in data classification, regression analysis, recommendation engines and anomaly detection.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Experience in extensive usage of Struts, HTML, CSS, JSP, JQuery, AJAX and JavaScript for interactive pages.
- Involved in configuring and working with Flume to load the data from multiple sources directly intoHDFS.
- Hand on experience and knowledge in Lumira, Regex, Sed, Maven, Log4j, Junit and Ant.
- Hands-on experience with Hortonworks & Cloudera DistributedHadoop(CDH)
- Worked with the ApacheNififlow to perform the conversion of Raw XML data into JSON, AVRO.
- Experience in understanding security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience on predictive intelligence and smooth maintenance in spark streaming is done using Conviva and MLlib from Spark.
- Experience in configuring deployment environment to handle the application usingJettyserver and Web Logic 10 and Postgres database at the back-end.
- Involved in installing cloudera distribution ofHadoopon amazonEC2 Instances.
- Expertise on SPARK engine creating batch jobs with incremental load through HDFS/S3,KINESIS, Sockets, AWS etc.
- Imported data using Sqoop to load data from MySQL to S3Buckets on regular basis
- Experience in Installing, upgrading and configuring RedHat Linux 4.x, 5.x, and 6.x using Kickstart Servers
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Expert in JAVA 1.8LAMBDAS, STREAMS, Type annotations.
- Experience in deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera, MapR and Hortonworks distributions.
- Experience of MPP databases such as HP Vertica and Impala.
- Hands on experience in the SVN and GitHub.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,Pythonand Scala.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Hands on experience on implementation projects like Agile and Waterfall methodologies.
- Strong Experience on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Performed analytics in Hive using various files format like JSON, Avro, ORC, andParquet.
- Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server, DB2, Netezza.
- Experience in NoSQL databases like HBase, Cassandra, Redis and MongoDB.
- Experience and hands-on knowledge in Akka andLIFTFramework.
- Experience in using design pattern, Java, JSP, Servlets, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac os and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
PROFESSIONAL EXPERIENCE
Sr. Hadoop/Spark Developer
Confidential, Colorado
Responsibilities:
- Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Cassandra, Oozie, Sqoop, Kafka, Spark, Impala with Horton works distribution
- Performed source data transformations using Hive.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Involved in developing a Map Reduce framework that filters bad and unnecessary records.
- Developed Sparkscripts by using Scala shell commands as per the requirement.
- Used Kafka to transfer data from different data systems to HDFS.
- Created Sparkjobs to see trends in data usage by users.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams.
- Designed the Column families in Cassandra.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Developed Sparkcode to using Scala and Spark-SQL for faster processing and testing.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality
- Worked on different file formats like Text files and Avro.
- Worked on Agile Methodology projects extensively.
- NIFIis designed to pull data from various sources and push it in HDFS and Cassandra.
- Experience designing and executing time driven and data driven Oozie workflows.
- Setting up Kerberos principals and testing HDFS, Hive, Pig, and MapReduce access for the new users.
- Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Log4j framework has been used for logging debug, info & error data.
- Collected the logs data from web servers and integrated in to HDFS using Flume
- Implemented map-reduce counters to gather metrics of good records and bad records.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Developed customized UDF's in java to extend Hive and Pig functionality.
- Experience in importing data from various data sources like Mainframes, Teradata, Oracle and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and Spark and loaded data into HDFS.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING.
- Implemented best income logic using Pig scripts.
- Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
- UsedHibernateORM framework withSpringframework for data persistence and transaction management.
- Performance analysis of Spark streaming and batch jobs by using Sparktuning parameters.
- Used React Bindings for embracing Redux.
- Worked towards creating real time data streaming solutions using Apache Spark/SparkStreaming, Kafka.
- Worked on Express view engine which renders React components on server.
- Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
- Used File System check (FSCK) to check the health of files in HDFS.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
Environment: Hadoop, Hive, Map Reduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, React JS, JUnit, agile methodologies, Horton works, Soap, NIFI, Teradata, MySQL.
Hadoop Developer
Confidential, Maryland
Responsibilities:
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
- Installed and configured Pig and written Pig Latin scripts.
- Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Developed a data pipeline using HBase, Sparkand Hive to ingest, transform and analyzing customer behavioral data.
- Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
- Handled importing of data from machine logs using Flume.
- Created Hive Tables, loaded data from Teradata using Sqoop.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Ec2Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Develop ETL Process using SPARK, SCALA, HIVE and HBASE.
- Wrote Java code to format XML documents; upload them toSolrserver for indexing.
- Used with NoSQL technology (Amazon Dynodb) to gather and track event-based metric.
- Maintenance of all the services in Hadoopecosystem using ZOOKEPER.
- Worked on implementing Sparkframe work.
- Designed and implemented Spark jobs to support distributed data processing.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Followed agile methodology for the entire project.
- Experience in working with Hadoop clusters using Cloudera distributions.
- Involved inHadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Flume, Cloudera.
Hadoop Developer
Confidential
Responsibilities:
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Working with Eclipse using Maven plugin for Eclipse IDE.
- Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and RestAPI.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java andTalend.
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Hands on experience in loading data from UNIX file system and Teradata to HDFS
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on theHadoopcluster.
- Developed PIG scripts for the analysis of semi structured data.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis
- Used Informatica as an ETL tool to create source/target definitions, mappings and sessions to extract, transform and load data into staging tables from various sources.
- Developed ETL procedures to ensure conformity, compliance with standards and lack of redundancy, translating business rules and functionality requirements into ETL procedures.
- Supported Data Analysts in running Map Reduce Programs.
- Implemented Name Node backup using NFS. This was done for High availability
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Experience in Pig scripts for sorting, joining and grouping the data
- Experienced with working on Avro Data files using Avro Serialization system.
- Extracted files fromCouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Developed ETL using Hive, Oozie, shell scripts and Sqoop. Used Scala for coding the components, & Utilized Scala pattern matching in coding
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Unix Shell Scripting, Oozie, ETL, Scala.
Java Developer
Confidential
Responsibilities:
- Involved in the design, development and deployment of the Application usingJava/J2EE Technologies.
- Developed web components using JSP Servlets, JDBC and Coded JavaScript for AJAX and client side data validation.
- Designed and Developed mappings using different transformations like Source Qualifier, Expression, Lookup (Connected & Unconnected), Aggregator, Router, Rank, Filter and Sequence Generator.
- Imported data from various Sources transformed and loaded into Data Warehouse Targets using Informatica Power Center.
- Made substantial contributions in simplifying the development and maintenance ofETLby creating re-usable Source, Target, Mapplets, and Transformation objects.
- Experience in development of extracting, transforming and loading (ETL), maintain and support the enterprise data warehouse system and corresponding marts
- Prepare DR plan and recovery process for GDW application.
- Developed JSP pages using Custom tags and Tiles framework and Struts framework.
- Used different user interface technologies JSP, HTML, CSS, and JavaScript for developing the GUI of the application.
- Experience in development of extracting, transforming and loading (ETL), maintain and support the enterprise data warehouse system and corresponding marts.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle10gDatabase
- Created TCL scripts for build, deployment and execution in Unix environment to automate the entire ETL process.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Used SVN and GitHub as version control tool.
- Experienced in developing web-based applications usingPython, Django, PHP, XML, CSS, HTML, Java Script and JQuery.
Environment: HTML, Java Script, Ajax, Servlets, JSP, SOAP, Java, Hibernate, Scrum, JIRA, Git Hub, JQuery, CSS, XML, ETL, Oracle.
Java Developer
Confidential
Responsibilities:
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Designed and implemented the training and reports modules of the application using Servlets, JSP andajax
- Developed XML Web Services using SOAP, WSDL, and UDDI.
- Interact with Business Users and Develop Custom Reports based on the criteria defined.
- Requirement gathering and information collection. Analysis of gathered information so as to prepare a detail work plan and task breakdown structure
- Experience in develop of SDLC life cycle and undergo in all the phases in it.
- Implemented applications usingJava, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
- Developed action Servlets and JSPs for presentation in Struts MVC framework.
- Developed PL/SQL View function in Oracle 9i database for get available date module.
- Used Oracle SQL 4.0 as the database and wrote SQL queries in the DAO Layer.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Created the UI tool - usingJava, XML, XSLT, DHTML and JavaScript
- Wrote build & deployment scripts using shell, Perl and ANT scripts
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
- Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD
Environment: HTML, Java Script, Ajax, Servlets, JSP, SOAP, SDLC life cycle, Java, Hibernate, Scrum, JIRA, Git Hub, JQuery, CSS, XML, ANT, ETL.
Java Developer
Confidential
Responsibilities:
- Developed presentation layer using HTML, JSP, Ajax, CSS and JQuery.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Skills gained on web-based REST API, SOAP API, Apache for real-time data streaming
- Programmed Oracle SQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back-end processes to create and update staging tables, log and audit tables, and creating primary keys.
- Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
- Developed mappings, sessions and workflows using Informatica Designer and Workflow Manager based on source to target mapping documents to transform and load data into dimension tables.
- Used FTP services to retrieve Flat Files from the external sources.
- Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
Environment: Java, Ajax, REST API, SOAP API, Apache, Oracle, SQL Loader, HTML, JSP, JQuery.
