We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Around 8 years of extensive Professional IT experience, including 4+ years of Hadoop/Bigdata experience, capable of processing large sets of structured, semi structured and unstructured data and supporting systems application architecture.
  • Deep understanding of Hadoop Architecture of versions 1x,2x and various components such as HDFS, YARN and MapReduce concepts along with Hive, Pig, Kafka, Sqoop, Oozie, Zookeeper, Map Reduce framework and NoSQL databases like HBase.
  • Experience in Scala’s FP, Case Classes, Traits and leveraged Scala to code Spark applications.
  • Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets.
  • Hands on experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera, Hortonworks distributions and MapR.
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Created ODBC connection through Sqoop between Hortonworks and SQL Server
  • Managed and Scheduled jobs on Hadoop cluster using Apache Oozie.
  • Partitioned and Bucketed data sets in Apache Hive to improve performance.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS and transferred large datasets between Hadoop and RDBMS by implementing SQOOP.
  • Perform transformations like event joins, filter bot traffic and some pre - aggregations using Pig.
  • Develop MapReduce jobs to convert data files into Parquet file format.
  • Execute Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Containerization of Kafka Systems using Docker
  • Develop business specific Custom UDF's in Hive, Pig.
  • Integrated Presto with MySQL and Hive following the steps outlined in the MySQL connector and Hive documentation respectively.
  • Experienced in writing queries using Presto for group by and sort by on the tables.
  • Used Presto with Hive meta store to query the tables in hive metastore.
  • Experience in performing SQL and hive operations using Spark SQL.
  • Responsible for spark streaming using Scala API and process data stored in NOSQL data bases
  • Developed Spark SQL scripts and involved in converting hive UDF’s to Spark SQL UDF’s
  • Develop a data pipeline using Kafka to store data into HDFS.
  • Responsible for batch processing by creating hive context using Spark SQL and push data set into Datawarehouse(DWS) for further processing uses of tableau (BI team) as well as data science team.
  • Good knowledge on Talend DQ & Data profiling. Responsible for batch processing and real time processing in HDFS and NOSQL Databases.
  • Trouble shooting, debugging & altering Talend particular issues, while maintaining the health and performance of the ETL environment.
  • Extensively created mappings in Talend using t-Map, t-Join, t-Replicate, t-Parallelize, t-Die, t-Aggregate Row, t-Warn, t-Log Catcher, t-Filter, t-Global map etc.
  • Experienced in working with various kinds of data sources such as Hortonworks Teradata and Oracle.
  • Utilized Apache Hadoop environment by Hortonworks.
  • Experience with cluster management technologies such as YARN, Mesos .
  • Experienced in installation, configuration, support and monitoring of Hadoop clusters using Cloudera distributions, Hortonworks HDP, MapR and AWS.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool and Hue.
  • Created Kafka Topics and distributed to different consumer applications.
  • Developed applications using Java, RDBMS and UNIX Shell scripting.
  • Hands-on experience with visualization tools such as MS Excel, MS Visio, and Tableau.
  • Experience in Basic Hadoop administration such as replication, node removal.
  • Great knowledge in documenting the Software Requirements Specifications including Functional Requirements, Data Requirements and Performance Requirements.
  • Highly organized with the ability to manage multiple projects and meet deadlines.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Apache Spark, Flume, Kafka, Scala, Impala

Distributions: Cloudera, AWS, Hortonworks

No SQL Databases: HBase, Cassandra

Development/Build Tools: Eclipse 4.4/4.3/3.8/3.7/3.5/3.2 Net Beans, Edit Plus 2, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J., Spring MVC, Hibernate

Development Methodology: Agile, Unified Modeling Language (UML), Design Patterns (Core

Java and J2EE), RationallyUnifiedProcess, WaterFall, Iterative.

Programming languages: SQL, Linux shell scripts, C, Java

Databases: MySQL, DB2, ODBC

Database Languages: MySQL, PL/SQL, Oracle

RDBMS: Teradata, Oracle MS SQL Server, MySQL and DB2

ETL Tools: MS Office suite, RAW, Tableau, Talend

Operating Systems: UNIX, LINUX, Mac OS and Windows Variants

Web Services: WebSphere, WebLogic, JBoss and Tomcat

File Formats: XML, Text, Sequence, RC, JSON, ORC, AVRO, and Parquet etc.

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Hadoop/Spark Developer

Responsibilities:

  • Good knowledge and experience with Hadoop stack - internals, Hive, Pig and Map Reduce
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Hands on experience in loading source data like Web Logs using Kafka pipelining to HDFS.
  • Experienced on loading and transforming of large sets of data from Cassandra source through Kafka and placed in HDFS for further processing.
  • Created Hive Tables, loaded transactional data from RDBMS using Kafka.
  • Built scala kafka client using Akka.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis.
  • Developed Spark SQL scripts and involved in converting hive UDF’s to Spark SQL UDF’s
  • Responsible for batch processing and real time processing in HDFS and NOSQL Databases.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response
  • Load Data into HBase using Bulk Load and Non-bulk load.
  • Responsible for batch processing by creating hive context using Spark SQL and push data set into Data warehouse(DWS) for further processing uses of tableau (BI team) as well as data science team.
  • Responsible for spark streaming using Scala API and process data stored in NOSQL data bases
  • Experienced with performing analytics on Time Series data using HBase
  • Implemented HBase co-processors, Observers to work as event based analysis
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Performed operation using Partitioning pattern in Map Reduce to move records into different categories
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Implemented Hive Generic UDF's to implement business logic
  • Experienced with accessing Hive tables to perform analytics from java applications using JDBC.
  • Installed and configured Hive and written Hive QL scripts. Experienced with multiple file formats in HIVE, like Sequence file format, ORC file format etc.,
  • Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources
  • Wrote Pig Latin to manipulate ETL process and aggregate data in Hortonworks
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
  • Wrote python scripts to parse XML and JSON reports and load the information in database.
  • Worked for the betterment of existing algorithms in Hadoop using Scala, Spark-SQL.
  • Worked with Developers in ensuring technical design gets translated into ETL Graphs, review the code in making sure it gets developed as per DDE Design/Development guidelines.
  • Co-ordinate with Quality Services in making sure they understand the requirement clearly in order for their validations.

Environment: Casandra, Map jobs, Spark 2.1.0, Spark SQL, Pig Scripts, Pig UDF’s, Oozie, HIVE, AVRO, Hive, Scala, Kafka, Map Reduce, Restful Service, Java, Intellij, AWS, Python, Unix, Oracle DB.

Confidential, Phx, AZ

HADOOP DEVELOPER

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop .
  • Analyzed large data sets by running Hive Queries and Pig scripts.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Involved in creating Hive tables, loading and analyzing data using Hive Queries.
  • Developed simple to complex MapReduce jobs.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data.
  • Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Wrote Pig Latin to manipulate ETL process and aggregate data in Hortonworks
  • Created ODBC connection through Sqoop between Hortonworks and SQL Server
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Created custom new columns depending up on the use case while ingesting the data into HadoopLake using Pyspark.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Worked with application teams to install Hadoop updates, patches and version upgrades as required.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Implemented best income logic using Pig scripts and UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Managed and reviewed Hadoop log files. Used Scala for integration Spark into Hadoop .
  • Responsible to manage data coming from different sources.
  • Implemented Oozie workflow engine to run multiple Hive and Python jobs.
  • Migrated data existing in Hadoop cluster into Spark and used SparkSQL and Scala to perform actions on the data.
  • Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
  • Troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: HDFS, Map Reduce, Spark, Kafka, Storm, Scala, Yarn, Hive, Pig, Sqoop, Flume, Oozie, Impala, Hortonworks, Hbase, Cassandra, Oracle 11g, Python, Shell scripting, Perl, Linux, SVN, GitHub.

Confidential, Phx, AZ

HADOOP DEVELOPER

Responsibilities:

  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data
  • Responsible for building scalable distributed data solutions using Hadoop
  • Transformed incoming data with Hive & Pig to make data available to internal users
  • Performed extensive Data Mining applications using HIVE
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement
  • Process the data and push the valid records to HDFS.
  • Import data from MySQL to HDFS using SQOOP.
  • Tune the MapReduce, PIG and Hive jobs to increase the performance and decrease the execution time of the jobs.
  • Compress the files downloaded from the servers before storing them in the cluster to save cluster resources.
  • Write Corejava programs to convert the JSON files to CSV or TSV files for further processing.
  • Optimize already developed long running MapReduce and Pig job for better performance and accurate results.
  • Create Hive databases and tables over the HDFS data and write HiveQL queries on the tables.
  • Schedule Hadoop and UNIX jobs using OOZIE.
  • Work with NoSQL databases like HBase.
  • Write Pig and HiveUDFs for processing and analyzing log files.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Visualize the complicated data analysis on the dashboards as per the business requirements.
  • Integrated Hive, PIG and Mapreduce jobs with elastic search to publish the metrics to the dashboards.
  • Utilized the most used Talend Components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
  • Also, Utilized Big Data components such as tSqoopExport, tSqoopImport, tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult,, tHbaseInput, tHbaseOutput along with executing the jobs in Debug mode and also utilizing the tlogrow component to view the sample output.
  • Submitted talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
  • Deployed talend jobs on various environments including dev, test and production environments.
  • Involved in analysis, design testing phases and responsible for documenting the technical specifications.

Environment: Hadoop 2x, YARN, HDFS, MapReduce, PIG, HIVE, HBASE, Shell Scripting, java, Oozie, TALEND, LINUX.

Confidential, Franklin Lakes, NJ

Hadoop DEVELOPER

Responsibilities:

  • Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
  • Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
  • Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Performed debugging and fine tuning in Hive & Pig for improving performance.
  • Used Oozie operational services for batch processing and scheduling workflows dynamically.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Performed Map side joins on data in Hive to explore business insights.
  • Involved in forecast based on the present results and insights derived from data analysis.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Worked on Real time streaming using Apache Storm.
  • Designed Apache storm topology flow.
  • Participated in team discussions to develop useful insights from big data processing results.
  • Suggested trends to the higher management based on social media data.

Environment: HDFS, MapReduce, Hive, HBase, Pig, Java, Git, Maven, Storm, Putty, REST, CentOS 6.3

Confidential

SQL Developer.

Responsibilities:

  • Wrote complex stored procedures to process prospective customer information and balanced the processing load between front end and back end.
  • Enhanced the old logical and physical database design to fit new business requirement, and implemented new design into SQL Server 2000.
  • Monitored query using query analyzer and tuned queries and procedures to boost database performance.
  • Used SQL profiler to optimize remote procedures and queries by creating workload files and setting various filters and parameters like cash hit ratios.
  • Created stored procedures to backup transaction logs, flush/grow/shrink log files, maintaining and archiving tables.
  • Assisted in designing and creating user Interface.
  • Implemented in writing client side validations using VBScript.
  • Responsible for the creation of SQL procedures, triggers, temp tables, and views for the development of Reports.
  • Involved in Database design, Data modeling and writing Stored Procedures.
  • Designed database and generated customized reports.

Environment: Stuidio.NET, IIS V5.1, SQL, HTML, SQL Server 2000, SQL Profiler,.NET Framework 2.

We'd love your feedback!