Big Data Developer Resume
Atlanta, GA
SUMMARY:
- Overall, 5 years of IT experience that includes hands on experience in Big Data and development.
- Expertise the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Scala, Kafka, Yarn, Oozie.
- Performed different Optimization Techniques like partitioning, Bucketing in Hive
- Worked with data in multiple formats including SequenceFile, ORC, Xml, Json, Text (delimited)/CSV.
- Developed User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Written the Apache PIG scripts to process the HDFS data.
- Ability to learn and adapt quickly and apply the new tools and technology.
- Experience in working on Spark SQL queries, Data frames, import data and perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Experience in using Linux environment and Linux commands.
- Experienced in implementing Spark RDD transformations.
- Used various compression techniques like Snappy to save data and optimize data transfer.
- Used Hive queries to query data in Hive Tables and loaded data into HBase Tables.
TECHNICAL SKILLS:
Hadoop Technologies: HDFS, Map Reduce, Yarn, Spark, Spark SQL, Pig, Hive, Sqoop, Hue, Hbase, Oozie, Impala, Flume, Kafka(knowledge).
IDE s: IntelliJ, Eclipse, Jupyter, PyCharm, Sublime text.
Oracle 11g/10g, MS: SQL Server, My SQL.
Operating Systems: Unix / Linux, Windows.
Programming Languages: Scala, Python.
Hadoop Distributions: Cloudera, Horton Works
Tools: MS - Office, JIRA, Putty, FileZilla, WinSCP
Visualization: Tableau
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Big Data Developer
Responsibility:
- Involved in design and development phases of Software Development Life Cycle using agile methodology.
- Used Sqoop to load data from Oracle Database into Hive.
- Involved in creating Hive tables, loading data and writing HIVE queries as per requirement defined with appropriate static and dynamic partitions and bucketing, intended for efficiency.
- Used flume to collect the entire web log from online ad-servers and push into HDFS.
- Load and transform large sets of structured and semi structured data.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and Hive using SQOOP.
- Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
- Handled importing of data from various data sources, performed transformations using Hive and loaded data into HDFS.
- Written the Apache PIG scripts to process the HDFS data.
- Involved in converting Hive/SQL queries into Spark RDD Transformations using Spark Data frames and Scala.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive on different file formats to meet the business requirement.
- Worked on Loading log data into HDFS using Flume, Kafka and performing ETL integrations.
- Used Reporting Tool Tableau to connect with Hive for generating daily reports of data.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Written Hbase queries for different metrics.
- Good knowledge in creating data frames using spark SQL. Involved in loading data into HBase NoSQL Database.
Skills: Spark, Spark SQL, HDFS, Hive, Pig, Kafka, Sqoop, Hbase, Scala, Shell scripting, Linux, MySQL, IntelliJ, Oracle, Git, Tableau, MySQL.
Confidential, Chicago, IL.
Big Data Engineer
Responsibility:
- Used Hive to analyze the partitioned and bucketed data and compute various metrics from reporting on the dashboard.
- Optimizing Map reduce code, pig scripts, user interface analysis, performance tuning and analysis.
- Experienced in implementing Spark RDD transformations and writing queries in Spark SQL using Scala and Python to implement business logic.
- Load and transform large sets of structured, semi structured, unstructured data even joins, and some pre-aggregations before storing data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop for visualization and generate reports by our BI team.
- Involved in creating Hive tables by loading data and writing hive queries that will run internally in map reduce way.
- Worked with different file formats like csv, xml, Json and applied compression technics to save the storage.
- Written HBase queries for finding different metrics.
- Worked on partitioning the Hive table and running the scripts in parallel to reduce the run time of scripts.
- Done performance testing for different file compression technics to suit the best one.
- Developing Hive User Defined Functions, compiling them into jars, adding them to the HDFS, and executing them with Hive Queries.
- Importing data from Oracle to HDFS & Hive for analytical purpose.
Skills: HDFS, Hive, HBase, SQoop, PIG, Spark, Scala
Confidential
Python Developer
Responsibility:
- Worked on software development in python and IDEs: PyCharm, Eclipse, Sublime text and Jupyter Notebook.
- Worked with OOPS, Multithreading, collections concept in python.
- Excellent debugging, problem solving and optimization skills.
- Knowledge on Django framework design and developed CSS, HTML and Bootstrap for web-based screens.
- Involved in Software Development Life Cycle including requirements gathering and designing.
- Wrote and executed various MySQL database queries from python using python-MySQL connector and MySQL DB package.
- Used subversion control for regular code reviews and pull/merge requests.
- Efficient in using Hive and hive query language.
- Used Hive to analyses data by writing SQL like queries called HiveQL.
- Good experience in using Linux environment and Linux commands.
- Performed efficient delivery of code based on principles of Test Driven Development. Used IDE tool to develop the application and JIRA for bug and issue tracking.
Skills: Python OOPS, MySQL, Django, JIRA, CSS, HTML, Bootstrap, Pycharm, Eclipse, Linux, Hive.