We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

2.00/5 (Submit Your Rating)

North America, WI

SUMMARY:

  • Hadoop Developer with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
  • Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
  • Experience in using D - Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Experienced in working with in-memory processing framework like Spark Transformations, SparkQL , MLib and Spark Streaming .
  • Expertise in creating Custom Serdes in Hive.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experienced in implementing POC using Spark Sql and Mlib libraries.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL , Data Frame, Pair RDD's, YARN.
  • Hands on experience in handling Hive tables using Spark SQL.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
  • Hands-on experience in managing and reviewing Hadoop logs.
  • Good knowledge about YARN configuration.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Good working knowledge on NoSQL databases such as Hbase, MongoDB and Cassandra.
  • Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Developed automated workflows for monitoring the landing zone for the files and ingestion into HDFS in Bedrock Tool and Talend.
  • Created Talend Jobs for data comparison between tables across different databases, identify and report discrepancies to the respective teams.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
  • Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS:

Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, SQL, PYTHON, Scala

Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres

Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.

Version Tools: GIT, SVN, CVS

IDE: IBM RAD, Eclipse, IntelliJ

Tools: TOAD, SQL Developer, ANT, Log4J

Web Services: WSDL, SOAP.

ETL: Talend ETL, Talend Studio

Web/App Server: UNIX server, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Sr. Big Data Developer

Confidential, North America,WI

Responsibilities:

  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Developed Spark Applications by using Scala , Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources .
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates .
  • Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Gathered the business requirements from the Business Partners and Subject Matter Experts
  • Developed environmental search engine using JAVA, Apache SOLR and Cassandra.
  • Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with SOLR Search Engine.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Developed PIG UDFs to provide Pig capabilities for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders and Implemented various requirements using Pig scripts.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data
  • Created POC using Spark Sql and Mlib libraries.
  • Developed a Spark Streaming module for consumption of Avro messages from Kafka.
  • Implementing different machine learning techniques in Scala using Scala machine learning library, and created POC using SparkSql and Mlib libraries.
  • Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD’s in Scala.
  • Expertise in writing Scala code using Higher order functions for iterative algorithms in Spark for Performance considerations.
  • Experienced in managing and reviewing Hadoop log files
  • Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
  • Create and Maintain Teradata Tables, Views, Macros, Triggers and Stored Procedures
  • Monitored workload, job performance and capacity planning using Cloudera Distribution.
  • Worked on Data loading into Hive for Data Ingestion history and Data content summary.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Used Hive and Impala to query the data in HBase.
  • Created Impala tables and SFTP scripts and Shell scripts to import data into Hadoop.
  • Developed Hbase java client API for CRUD Operations.
  • Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
  • Experienced with AWS services to smoothly manage application in the cloud and creating or modifying the instances.
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  • Used EMR (Elastic Map Reducing) to perform bigdata operations in AWS.
  • Worked on Apache spark writing python applications to convert txt, xls files and parse.
  • Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Installed the application on AWS EC2 instances and configured the storage on S3 buckets.
  • Loading data from different source (database & files) into Hive using Talend tool.
  • Implemented Spark using Python/Scala and utilizingSpark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java
  • Experience in integrating Apache Kafka with Apache Spark for real time processing.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc
  • Involved in running Hadoop Streaming jobs to process Terabytes of data
  • Used JIRA for bug tracking and CVS for version control.

Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, SOLR, CDH3, Cassandra, Oracle, Unix/Linux, Hadoop, Hive, PIG, SQOOP, Flume, HDFS, J2EE, Oracle/SQL & DB2, Unix/Linux, JavaScript, Ajax, Eclipse IDE, CVS, JIRA

Big Data Engineer

Confidential, Palo Alto, CA

Responibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Experienced in designing and deployment of Hadoop cluster and different big data analytic tools including Pig , Hive , Flume , Hbase and Sqoop .
  • Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
  • Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by Flume.
  • Developed business logic in Flume interceptor in Java.
  • Implementing quality checks and transformations using Flume Interceptor.
  • Developed simple and complex MapReduce programs in Hive, Pig and Python for Data Analysis on different data formats.
  • Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
  • Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
  • Experienced in Kerberos authentication to establish a more secure network communication on the cluster.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Managed and reviewed Hadoop and HBase log files.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive .
  • Experienced in writing Spark Applications in Scala and Python.
  • Used Spark SQL to handle structured data in Hive.
  • Imported semi-structured data from Avro files using Pig to make serialization faster
  • Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
  • Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD’s.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Managing and scheduling Jobs on a Hadoop Cluster using Oozie workflows and Java schedulers.
  • Continuous monitoring and managing the Hadoop cluster through Hortonworks(HDP) distribution.
  • Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
  • Involved in review of functional and non-functional requirements.
  • Indexed documents using Elastic search.
  • Worked on MongoDB for distributed Storage and Processing.
  • Implemented Collections and Aggregation Frameworks in MongoDB.
  • Implemented B Tree Indexing on the data files which are stored in MongoDB.
  • Good knowledge in using MongoDB CRUD operations.
  • Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
  • Implemented Flume NG MongoDB sink to load the JSON- styled data into MongoDB.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud service Amazon Redshift.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
  • Used Zookeeper to provide coordination services to the cluster.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau .
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
  • Written Shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Followed Agile methodology for the entire project.
  • Experienced in Extreme Programming, Test-Driven Development and Agile Scrum

Environment: H ortonworks(HDP), Hadoop, Spark, Sqoop, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MongoDB, Java, Python, MapReduce, HDFS, Tableau, Informatica.

Hadoop Developer

Confidential, Hartford, Ct

Responsibilities:

  • Analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache and stored the data into HDFS for analysis.
  • Strong knowledge on creating and monitoring cluster on Hortonworks Data platform.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System
  • Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Created the Hive external tables using Accumulo connector.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
  • Design, develop, test, implement and support of Data Warehousing ETL using Talend and Hadoop Technologies.
  • Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and transformation stages.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau
  • Prepared the Technical Specification document for the ETL job development.
  • Involved in loading data from UNIX file system and FTP to HDFS
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed UDF's in java for enhancing functionalities of Pig and Hive scripts.
  • Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Experience in managing MongoDB environment from availability, performance and scalability perspectives.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.

Environment: Hadoop, HDFS, Hortonworks, pig, Hive, HBase, Flume, Sqoop, Oozie, Python, Unix, Shell Scripting, Talend, Tableau, Scala Spark, Spark SQL, Mongo dB.

Hadoop Developer

Confidential, Houston, Texas

  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Flume Agents for loading and filtering the streaming data into HDFS.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Moving Bulk amount data into HBase using Map Reduce Integration.
  • Developed Map-Reduce programs to clean and aggregate the data
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
  • Implement counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Implemented Secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment:: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Mysql, CSV, Avro data files.

Java/J2EE Developer

Confidential, IN

Responsibilities:

  • Designed and developed a system framework using J2EE technologies based on MVC architecture.
  • Followed agile methodology to implement the requirements and tailored the application to customer needs.
  • Involved in the phases of SDLC (Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application
  • Developed and enhance web applications using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.
  • Developed the UI components using JQuery and JavaScript Functionalities.
  • Developed J2EE components on Eclipse IDE.
  • Created the EAR and WAR files and deployed the application in different environment.
  • Used JNDI as part of service locator to locate the Factory objects, Data Source Objects and other service factories.
  • Hands on experience using Teradata utilities (FastExport, MultiLoad, FastLoad, Tpump, BTEQ and QueryMan).
  • Implemented test scripts to support test driven development and continuous integration.
  • Modifications on the database were done using Triggers, Views, Stored procedures, SQL and PL/SQL.
  • Implemented the mechanism of logging and debugging with Log4j.
  • Used JIRA as a bug-reporting tool for updating the bug report.

Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JNDI, JMS, JIRA, JavaScript, XML, DB2, SVN, log4j.

Java Developer

Confidential

  • Involved in designing development, integration testing of modules, requirements .
  • Worked on Use case diagrams class diagrams and sequence diagrams using Rational rose for design phase.
  • Used Agile methodology for the every module in project for developing the application.
  • Developed the application implementing Spring MVC Architecture with Hibernate as ORM framework.
  • Developed the application using Front Controller, Business delegate, DAO and Session Facade patterns.
  • Designed and developed User Interface using JSP, HTML, CSS, MXML,JSF, JSP, JSTL, AJAX and XML also involved in Designing and developing several Flex UI Screens.
  • Involved in Design and developing user interface using Flex Components ViewStack, Checkboxes, Repeater, Title.
  • Involved in developing database transactions through JDBC.
  • Used XMLusing DOM and SAX parsers between different components for transferring the data.
  • Extensively worked in developing Custom tags from Struts tags for highlighting the invalid input fields if validation error occurs.
  • Developed WSDL based web services using WSDL, SOAP, JAX-WS, AXIS, APACHE X FIRE, JAXB .
  • Used web services like RESTFUL for developing XML and JSON using JAX-RS
  • Used CSV for version control.
  • Developed and deployed the applications using servers like Apache Tomcat, JBoss.
  • Created test cases by using Junit Flex unit.
  • Wrote Maven build scripts for building applications.

Environment: Java, J2EE, MVC, Servlets, Spring, JSP, XML, HTML, MXML, Maven, Adobe flex builder, Flex API, Blaze DS, Flex, Tag libs, REST, CSS, JavaScript, JQuery, AJAX, JSON, CAS, Eclipse, Apache Tomcat 7, JBoss, Web Services WSDL, SOAP, Restful, Junit Flex unit, Clear Case,

We'd love your feedback!