Big Data Developer Resume
NY
OBJECTIVE:
- Qualified professional and Certified Hortonworks Certified Developed (HDPCD) with overall 6+ years of experience seeking a Big Data Hadoop Developer job in open source related technologies such as Hadoop and Spark, where I may be able to use well - honed skills from Hadoop related technologies in order to help users to develop their applications by means of the most modern form of development and methodologies.
SUMMARY
- Good knowledge on Open source Apache Hadoop, HDFS File system, YARN, Sqoop, Hive, Spark, Python, core Java using Hortonworks distributed platform (HDP 2.X).
- Very good idea about developing data pipelines using distributed technologies with the help of tools such as Sqoop, Hive and Spark.
- Experience in working with BI teams and transform big data requirements into Hadoop centric technologies.
- Hands on experience on Hadoop ecosystem - HDFS, Map Reduce, Hive, Spark, Kafka and Hbase/Phoenix.
- Knowledge in development of end-to-end data ingestion pipelines that connects various data sources outside Hadoop environment such as RDBMS databases, Log files etc into the Hadoop data lake.
- Good knowledge in Linux Operating systems of various distros including RHEL (4, 5, 6) and Centos.
- Worked on different file formats like ORC, JSON, and Avro.
- Proficient in Apache Spark and SparkSQL.
- Expertise using Spark Framework for batch and real time data processing.
- Good working knowledge with various RDBMS databases such as Oracle 10g, 11i, 12, SQL Server, DB2, MySQL etc.
- Excellent SQL development knowledge including Data Definition language (DDL), Data Query language and Data Manipulation language (DMLs)
- Good knowledge on Database design, to ensure stability, reliability and performance.
- Expertise in drafting complex queries, which includes various kinds of joins to address business needs.
- Expertise in data normalization and de-normalizations techniques.
- Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
- Well versed with security concepts such as Kerberos and Ranger authentication.
- Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Spark Streaming.
- Strong Analytical and inter personnel skills with ability to work as a team & individually and a can-do attitude with the ability to grasp and understand new technology concepts.
- Technical expertise in UNIX/LINUX Shell Commands
- Expert in UNIX shell scripting - ksh, bash, sh and excellent knowledge of Python scripting.
- Experienced with patch and package administration including building of packages and RPMs.
- Experienced in diagnosing and resolving TCP/IP connection problems.
- Experienced in making Disaster Recovery (DR) and Disaster Contingency Plans.
- Excellent written and verbal communication skills.
- Ability to operate effectively on a 24x7 basis in crisis situations.
- Effective team player with excellent logical and analytical abilities.
- In-depth knowledge of computer applications and scripting like Shell, Python and XML.
- Used CRON, AutoSys for enterprise job scheduling.
- Proficient in working with Java, bash and Python scripts.
- Expertise in Querying RDBMS such as Oracle, MYSQL and SQL Server.
- Expertise in data migration from various databases to Hadoop HDFS and Hive using Scoop.
- Expertise in building the data pipeline using Python.
- Has In-depth knowledge of Hadoop Ecosystem and MapReduce Framework.
- Worked with Hive’s data warehousing infrastructure to analyze large structured datasets.
- Skilled knowledge in the creation of manages tables and external tables in Hive Ecosystem.
- Extensive working knowledge in designing and developing business process using HIVE, Sqoop, HBase, Spark.
- Expertise using Spark Framework for batch and real time data processing.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
- Good understanding of the principles and best practices of Software Configuration Management (SCM) and Closely worked with development, QA and other teams to ensure automated test efforts are tightly integrated with the build system and in fixing the error while doing the deployment and building.
- Extensive experience in all phases of Software Development Life Cycle with emphasis in Designing, Developing, Implementation, Deployment and Support of distributed enterprise scalable, secure and transactional J2EE applications.
- Experience in UNIX shell scripting, FTP, SFTP and file management in various UNIX environments.
- Excellent communication and inter-personal skills. Detail oriented, Self-motivated, quick learner, responsible, analytical, time bound team player with ability to co-ordinate in a team environment.
TECHNICAL SKILLS
Open Source technologies: Apache Hadoop, HDFS, YARN, Capacity Scheduler, Hive, Spark(PySpark), Sqoop, Hbase, Kafka, Zookeeper
Operating Systems: UNIX, Linux, MAC OS, Windows
Version Control tools: GIT, Github, Gitlab
Scripting Languages: Python, Shell, bash, XML, JSON, YAML
Build Tools: MAVEN, ANT
Web Technologies: JDBC, XML, HTML
BUG Tracking tools: JIRA
Programming Langiages: Python, C, Core Java
Scripting: Shell scripting, Python, SQL
Databases: SQL, MySQL, Oracle, SQL Server, DB2
PROFESSIONAL EXPERIENCE:
Confidential, NY
Big Data Developer
Responsibilities:
- Used Hortonworks Data Platform (HDP 2.5.0, HDP 2.6.1) components to build various data pipelines.
- Used Ingestion tools such as HDFS put commands to ingest batch files and log files, Sqoop for ingesting data from RDBM systems such as Oracle, SQL Server and MySQL into the data lake, Spark streaming for ingesting real time data from Kafka topics.
- Used transformation tools such as Apache Hive/Tez and Apache Spark (PySpark) for transforming various data sources such as JSON, CSV files, Pipe delimited files into HDFS datasets, Hive and Phoenix tables.
- Worked on spark/pyspark programming to create UDFs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames in Python.
- Transferred purchase transaction details from legacy systems to HDFS.
- Wrote Python scripts to make spark-streaming work with Kafka as part of spark Kafka integration efforts.
- Used Sqoop to transfer data between relational databases and Hadoop.
- Worked on HDFS to store and access huge datasets within Hadoop
- Good hands on experience with git and GitHub.
- Built on-premise data pipelines using kafka and spark for real time data analysis.
- Tuned various Hive and Spark jobs for better performance with techniques such as partitioning, bucketing, and repartitioning.
- Created many HQL schemas and utilized them throughout the program wherever required
- Analyzed existing code and made the bug fixes wherever required
- Evaluated various file formats for better performance.
- Developed various Hive scripts, PySpark scripts and shell scripts to deploy Hadoop applications in production.
- Converted unstructured data to structured data by writing Spark code.
- Participated in building scalable distributed data solutions using Hadoop.
- Performed data import from several data sources, transformations using Hive, and Spark SQL(Pyspark).
- Performed various data validations, data cleansing and data aggregation using series of Spark Applications.
Environment: RHEL 6.7, Hortonworks Data Platform(2.5.0 and 2.6.1), Apache Hadoop 2.7.1, Apache YARN 2.7.1, Apache Hive 1.2.1, Apache Spark 1.6.X and 2.2.X, Phoenix 4.7, Hbase 1.1, Kafka 0.10, Kerberos, Ranger
Confidential, PA
Big Data Developer
Responsibilities:
- Used Hortonworks Data Platform (HDP 2.2 and HDP 2.3) components to analyze and build data models.
- Used various Stinger initiatives such as Tez, ORC and Vectorization in Hive 0.13 to develop optimized data pipelines.
- Worked with data science teams to build datasets that will suit for building models in R and SAS.
- Worked in profiling various Hive tables for data quality.
- Leveraged various file formats for better query performance.
- Bench marked various compression formats to measure query read and write performances.
- Experience in managing and reviewing Hadoop Log files.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Involved in creating Hive tables, loading with data and writing hive queries, which runs internally in MapReduce/Tez.
- Loaded and transformed large sets of structured, semi structured, and unstructured data with MapReduce, Hive and pig.
- Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
- Imported data into HDFS and performed data extract from Oracle and Flat files into HDFS with Sqoop.
- Exported the analyzed data to relational databases using Sqoop for visualization and generated reports for BI team.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Analyzed large data sets to determine optimal way for aggregation and reported on the same.
- Implemented various Hive scripts for further Analysis and calculated various metrics used for downstream reporting.
Environment: RHEL 6, Centos 6, Hortonworks Data Platform (2.2 and 2.3), Apache Hadoop 2.6, Apache YARN 2.6, Apache Hive 0.13, Apache Spark 1.6.X, Hbase
Confidential, IL
Java Developer
Responsibilities:
- Involved in designing the new Riders and products.
- Coding key modules such as calculation Engine.
- Review Code, Unit Test Case and System Integration test cases.
- Proposed Sales Illustration Calculation Engine rewrite.
- Proposed and implemented the automation of the manuals work via SI Admin Web UI.
- This reduces the manual code development, deployment and Testing and Release Phases by 30 %.
Environment: Java, JSP, Servlets, Struts, SOAP, CSS, Oracle, Weblogic9.0,MyEclipse