We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

0/5 (Submit Your Rating)

St Louis, MO

SUMMARY

  • Around 5+ years of extensive hands - on experience in IT industry including + years’ experience in deployment ofHadoop Ecosystems like MapReduce, Yarn, Sqoop, Flume, Pig, Hive, HBase, Cassandra, Zoo Keeper, Oozie, and Ambari,BigQuery, Big Table and 5+ years’ experience on Spark, Storm, Scala, Python.
  • Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
  • Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring
  • Extensive experience in business data science project life cycle including Data Acquisition, Data Cleaning, Data Manipulation, Data Validation, Data Mining, Machine Learning Algorithms, and Visualization.
  • Good Hands on experience in installing, configuring Cloudera clusters and using Hadoopecosystem components like Hadoop, Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
  • In depth understanding/knowledge of HadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Strong knowledge in extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive.
  • Experience on Productionizing Apache Nifi. for dataflows with significant processing requirements and controlling security of data flow.
  • Designed and developed RDD Seeds usingScalaand Cascading. Streaming data to Spark streaming using Kafka and then store it in HDFS.
  • Exposure to Spark, Spark Streaming, Spark MLlib, Scala and Implementing Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Hands on experience on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Good experience with Core Java concepts like Collections Framework, multithreading, memory management.
  • Experienced with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's,Spark YARN.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works, Cloud Storage and Amazon web services (AWS) and related technologies DynamoDB, EMR, S3, ML.
  • Experience in deploying NiFi Data flow in Production team and Integrating data from multiple sources like Cassandra, MongoDB.
  • Deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools
  • Complete end to end design and development of ApacheNiFiflow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above
  • Hands on experience in writing Python scripts, wrote Web Crawlers using Python.
  • Experienced in working with Mahout for applying machine learning techniques in the Hadoop Ecosystem.
  • Good Experience on Amazon Web Services like Redshift, Data Pipeline, ML.
  • Excellent hands on experience with importing and exporting data from relational database systems like MYSQL, ORACLE into HDFS and HIVE vice-versa, using SQOOP traditional data movement technologies.
  • Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
  • Experience in installation, configuration, support and management of a Hadoop Cluster using Cloudera Distributions.
  • Experienced Spark scripts by using Scala shell as per requirements.
  • Experience on tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Developed REST APIs using Java, Play framework andAkka.
  • Expertise in search technology's like SOLR, Informatica & Lucene.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs and Scala and Performed map-side joins on RDD's.
  • Experienced in writing Hadoop Jobs for analyzing data using Hive Query Language (HQL), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in business meetings for requirements gathering form business clients.
  • Experienced in Storm builder topologies to perform cleansing operations before moving data into HBase.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into Hdfs.
  • Experience on configuring fully the Flume agent, suitable for all type of logger data and store them in Avro Sink in Parquet file format and developing 2-tier architecture connecting channels between Avro sinks and Source.
  • Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Informatica of historical data saved in Hdfs and data analysis using Splunk enterprise edition.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub and Bit Bucket.Good knowledge in job workflow scheduling and monitoring tools like Oozie, Zookeeper.

TECHNICAL SKILLS

Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Pig Latin, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache Crunch, Apache NiFi, Apace STORM, Apache Kappa, Apache Kafka, Sqoop, Flume.

Databases: MongoDB, Netezza, SQL Server, MySQL, ORACLE, DB2.

Development Methodologies: Waterfall, UML, Design Pattern (Core Java and J2EE), Agile Methodologies (Scrum).

Frameworks: MVC, Struts, Hibernate, Spring Framework, Spring Boot.

IDE Development Tools: Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij Jetbrains.

Java Technologies: Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8 , JDBC, Hibernate, XML Parsers, JSP 1.2/2, Servlets, EJB, JMS, Struts, Spring Framework, Java Beans, AJAX, JNDI.

Programming Languages: C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Swift, Android, PL/SQL, HQL, Unix, Shell Scripting.

NoSQL Databases: HBase, MongoDB, Cassandra.

Operating Systems: Windows, Linux, Unix.

Management Tech: SVN, Git, Jira, Maven.

Scripting Languages: Python, Perl, Shell, Sheme, Tcl, Unix Shell Scripts, Windows Power Shell

Virtualization Technologies: VMware ESXi, Windows Hyper-V, Power VM, Virtual box, Citrix Xen, KVM.

Web Technologies: HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js, Express.js.

Web Servers: Web Logic, Web Sphere, Apache Tomcat, JBOSS.

Web Services: SOAP, RESTful API, WSDL

PROFESSIONAL EXPERIENCE

Confidential, St Louis, MO

Sr. Hadoop/Spark Developer

RESPONSIBILITIES:

  • Involved in deploying systems on Amazon Web Services (AWS) Infrastructure services EC2.
  • Experience in configuring, deploying the web applications on AWS servers using SBT and Play.
  • Migrated Map Reduce jobs into Spark RDD transformations using Scala.
  • Used Spark API over ClouderaHadoopYARN to perform analytics on data in Hive.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
  • Developed Spark code using Spark RDD and Spark-SQL/Streaming for faster processing of data.
  • Performed configuration, deployment and support of cloud services including Amazon Web Services (AWS).
  • Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage,Redshift, Data Pipeline, EMR.
  • Responsible for all Public (AWS) and Private (Openstack/VMWare/DCOS/Mesos/Marathon) cloud infrastructure
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and configuring Data Pipelining.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Involved in Developing a Restful service usingPythonFlask framework.
  • Expertise in working with Python GUI frameworks - PyJamas, Jython.
  • Experienced in using Apache Drill data-intensive distributed applications for interactive analysis of large-scale datasets.
  • Developed end to end ETL batch and streaming data integration into Hadoop(MapR), transforming data.
  • Used Pythonmodules such as requests, urllib, urllib2 for web crawling.
  • Tools developed extensively include Spark, Drill, Hive, HBase, Kafka & MapR Streams, PostgreSQL, Stream Sets.
  • Used Hive Queries in Spark-SQL for analysis and processing the data.
  • Worked as a key role in a team of developing an initial prototype of a NiFi big data pipeline. This pipeline demonstrated an end to end scenario of data ingestion, processing.
  • Used HUE for running Hive queries. Created Partitions according to day using Hive to improve performance.
  • WrotePythonroutines to log into the websites and fetch data for selected options.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Loaded some of the data into Cassandra for fast retrieval of data.
  • Worked in provisioning and managing multi-tenantHadoopclusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform and worked on DynamoDB, Ml.
  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Data Bricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
  • Exported data from DB2 to HDFS using SQOOP; loaded unstructured data intoHadoopFile System (HDFS)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
  • Used the RegEx, JSON and Avro for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
  • Converted all the vap processing from Netezza and implemented by using Spark data frames and RDD's.
  • Worked in writing Spark Sql scripts for optimizing the query performance.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

ENVIRONMENT: Cloudera,Horton Works distribution, HDFS, Spark, Hive, Pig, Map Reduce, Hue, Sqoop, Putty, HaaS (Hadoop as a Service), Apache Kafka, ApacheMesosand the AWS, Java Netezza, Cassandra, Oozie, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, Toad, YARN, Agile Methodology.

Confidential, St Louis, MO

Hadoop Developer

Responsibilities:

  • Concerned and well- informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Secondary Name Node,Data Node, YARN and Map Reduce programming.
  • Worked on MapReduce programs to cleanse and pre-process data from various sources and worked on Sequence files and Avro files on MapReduce programs.
  • Developed Cluster coordination services through Zookeeper.
  • Worked on debugging, performance tuning for better results by writing Hive UDF’s.
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
  • Implemented Optimized Map Joins to get data from different sources to perform cleaning operations before applying the algorithms.
  • Created highly optimized SQL queries for MapReduce jobs, seamlessly matching the query to the appropriate Hive table configuration to generate efficient report.
  • Used other packages such as Beautifulsoup for data parsing in Python.
  • Tuned, and developed SQL on HiveQL, Drill and SparkSQL.
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE, HBase.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
  • Worked on integration independent microservices for real-time bidding (scala/akka, firebase, cassandra, Elasticsearch)
  • Used slick to query and storing in database in aScalafashion using the powerfulScalacollection framework
  • Using HIVE processed extensively ETL loadings on a Structured Data.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement. Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Informatica, Java, and database views usingScala
  • PIG UDF was required to extract the information of the area from the huge data which we get from the sensors. Responsible for creating Hive tables based on business requirements.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Involved in NoSQL database design, integration and implementation. Loaded data into NoSQL database HBase.
  • Worked on debugging, performance tuning PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Used Flume extensively to gather, aggregate and moving log data files from Application Servers to a central location inHadoop Distributed File System (HDFS).
  • Connected the hive tables to Data analyzing tools like Tableau for Graphical representation of the trends.
  • Experienced in managing and reviewingHadooplog files.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for design & development of Spark SQL Scripts based on Functional Specifications.
  • Used Apache HUE interface to monitor and manage the HDFS storage.

ENVIRONMENT: HDFS, Map Reduce, Pig,Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, MongoDB, Eclipse and Shell Scripting.

Confidential, San Mateo, CA

Hadoop Developer

RESPONSIBILITIES:

  • Designed and developed data movement framework for multiple sources like SQL Server, Oracle, and MySQL.
  • Effectively used Sqoop to transfer data from databases (MySQL, Oracle) to HDFS, Hive
  • Developed scripts to automate the creation of Sqoop jobs for various workflows.
  • Developed Hive scripts to alter the tables and perform required transformations.
  • Developed a java MapReduce and PIG cleansers for data cleansing.
  • Worked on Hive UDFS to mask confidential information in the data.
  • Designed and developed MapReduce programs for data lineage.
  • Designed and developed the framework to log information for auditing and failure recovery.
  • Closed worked with the web application development team to develop the user interface for data movement framework.
  • Performed Job Automation by designing Oozie workflows.
  • Created Map Reduce programs to handle semi/unstructured data like xml, Json, Txt files,Avro data files and sequence files for log files.
  • A RESTful web service, built with python and cherrypy, retrieves data from an accumulodata warehouse
  • Maintaining the MySQL server and Authentication to required users for Databases. Appropriately documented various Administrative technical issues.
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Built a RESTful web service for storing and retrieving documents in an apacheaccumulodata store
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Optimized our Hadoop infrastructure at both the Software and Hardware level.
  • Experience in troubleshooting in MapReduce jobs by reviewing log files.
  • Developed end-to-end search solution using web crawler, Apache Nutch & Search Platform, Apache SOLR.

ENVIRONMENT: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Scala, Map Reduce, HBase, SQL, Sqoop, HDFS, Kafka, UML, Apache SOLR, Hive, Oozie, Cassandra, maven, Pig, UNIX, Python, MR Unit, Git.

Confidential

Java/J2EE Developer

RESPONSIBILITIES:

  • Performed analysis for the client requirements based on the developed detailed design documents.
  • Developed the J2EE application based on the Service Oriented Architecture by employing SOAP and other tools for data exchanges and updates.
  • Developed the functionalities using Agile Methodology.
  • Used Apache Maven for project management and building the application.
  • Worked in all the modules of the application which involved front-end presentation logic developed using Spring MVC, JSP, JSTL and JavaScript, Business objects developed using POJOs and data access layer using Hibernate framework.
  • Used JAX-RS (REST) for producing web services and involved in writing programs to consume the web services with Apache CXF framework.
  • Used Restful API and SOAP web services for internal and external consumption.
  • Used Spring ORM module for integration with Hibernate for persistence layer.
  • Involved in writing Hibernate Query Language (HQL) for persistence layer.
  • Used Spring MVC, Spring AOP, Spring IOC, Spring Transaction and Oracle to create Club Systems Component.
  • Wrote backend jobs based on Core Java & OracleDataBase to be run daily/weekly.
  • Coding the core modules of the application compliant with the Java/J2EE coding standards and Design Patterns.
  • Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of the application.
  • Worked onService-side and Middle-tier technologies, extracting catching strategies/solutions.
  • Designdataaccess layer usingDataAccess Layer J2EE patterns. Implementing the MVC architecture Struts Framework for handling databases across multiple locations and display information in presentation layer.
  • Used XPath for parsing the XML elements as part of business logic processing.

ENVIRONMENT: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, UML, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

Confidential

SQL Developer/ SQL DBA

RESPONSIBILITIES:

  • Used various Transformations like Aggregator, Router, Expression, Source Qualifier, Filter, Lookup, Joiner, Sorter, XML Source qualifier, Stored Procedure and Update Strategy.
  • Created AD-Hoc reports using Report Builder and maintained Report Manager for SSRS.
  • Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.
  • Extracting, Transforming and Loading (ETL) data from Flat File, Raw File, XML and any OLE DB Data Source to Excel, Flat File, Raw File, any OLE DB or MSSQLServer Destinations by using SSIS services.
  • Created stored procedures to build the Fact tables in the data mart for Multi-Dimensional analysis using (SSAS) and produced ad-hoc, standard and super user reports using SSRS.
  • Migrated data from various sources including flat files, MS Access, to MSSQLServer2005 and vice-versa using ETL.
  • Worked on installation, configuration, development, maintenance, administration and upgrade.
  • Generated Pie chart and Bar graphs for the exported data to represent graphical analysis.
  • Created Entity Relationship (ER) Diagrams to the proposed database.
  • Maintain referential integrity, Domain integrity and column integrity by using the available options such as constraints
  • Imported Different Datasets from MS ACCESS toSQLSERVER for Analysis Process.
  • Developed and deployed packages in SSIS, imported data on daily basis from the OLTP system, Staging area to Data Warehouse and Data Marts.
  • Created and implemented Cubes, and designed attribute relationships for optimal performance of Hierarchies and Fact Dimensions.
  • Worked closely with .Netdeveloperto create and to write Procedure using T-SQLand PL/SQL.

ENVIRONMENT: SQL server, Linux Environment, Ubuntu Operating System, SQLqueries (Complex Joins, Sub Queries), SSIS, SSAS, SSRS, ER-diagrams, Dash Boards.

We'd love your feedback!